Será feita a modelagem da base AmesHousing com o pacote tidymodels e alguns dos pacotes apresentados em aula ao longo do curso Programa Avançado em Data Science do Insper (São Paulo, Brasil).
Modelar uma previsão do preço de venda das casas com a maior acurácia possível da base AmesHousing.
Modelos: - Linear (com ou sem seleção stepwise) - LASSO - Ridge Regression - Bagging - Floresta Aleatória
library(AmesHousing)
library(tidyverse)
library(tidymodels)
library(skimr)
library(GGally)
library(vip)
Será carregada a base “make_ordinal_names”, pois que nela algumas colunas já possuem fatores ordenados, o que facilitará a modelagem. Com skim, temos uma ideia do que há na base.
dados <- make_ordinal_ames()
skim(dados)
| Name | dados |
| Number of rows | 2930 |
| Number of columns | 81 |
| _______________________ | |
| Column type frequency: | |
| factor | 46 |
| numeric | 35 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| MS_SubClass | 0 | 1 | FALSE | 16 | One: 1079, Two: 575, One: 287, One: 192 |
| MS_Zoning | 0 | 1 | FALSE | 7 | Res: 2273, Res: 462, Flo: 139, Res: 27 |
| Street | 0 | 1 | FALSE | 2 | Pav: 2918, Grv: 12 |
| Alley | 0 | 1 | FALSE | 3 | No_: 2732, Gra: 120, Pav: 78 |
| Lot_Shape | 0 | 1 | TRUE | 4 | Reg: 1859, Sli: 979, Mod: 76, Irr: 16 |
| Land_Contour | 0 | 1 | TRUE | 4 | Lvl: 2633, HLS: 120, Bnk: 117, Low: 60 |
| Utilities | 0 | 1 | TRUE | 3 | All: 2927, NoS: 2, NoS: 1, ELO: 0 |
| Lot_Config | 0 | 1 | FALSE | 5 | Ins: 2140, Cor: 511, Cul: 180, FR2: 85 |
| Land_Slope | 0 | 1 | TRUE | 3 | Gtl: 2789, Mod: 125, Sev: 16 |
| Neighborhood | 0 | 1 | FALSE | 28 | Nor: 443, Col: 267, Old: 239, Edw: 194 |
| Condition_1 | 0 | 1 | FALSE | 9 | Nor: 2522, Fee: 164, Art: 92, RRA: 50 |
| Condition_2 | 0 | 1 | FALSE | 8 | Nor: 2900, Fee: 13, Art: 5, Pos: 4 |
| Bldg_Type | 0 | 1 | FALSE | 5 | One: 2425, Twn: 233, Dup: 109, Twn: 101 |
| House_Style | 0 | 1 | FALSE | 8 | One: 1481, Two: 873, One: 314, SLv: 128 |
| Overall_Qual | 0 | 1 | TRUE | 10 | Ave: 825, Abo: 732, Goo: 602, Ver: 350 |
| Overall_Cond | 0 | 1 | TRUE | 9 | Ave: 1654, Abo: 533, Goo: 390, Ver: 144 |
| Roof_Style | 0 | 1 | FALSE | 6 | Gab: 2321, Hip: 551, Gam: 22, Fla: 20 |
| Roof_Matl | 0 | 1 | FALSE | 8 | Com: 2887, Tar: 23, WdS: 9, WdS: 7 |
| Exterior_1st | 0 | 1 | FALSE | 16 | Vin: 1026, Met: 450, HdB: 442, Wd : 420 |
| Exterior_2nd | 0 | 1 | FALSE | 17 | Vin: 1015, Met: 447, HdB: 406, Wd : 397 |
| Mas_Vnr_Type | 0 | 1 | FALSE | 5 | Non: 1775, Brk: 880, Sto: 249, Brk: 25 |
| Exter_Qual | 0 | 1 | TRUE | 4 | Typ: 1799, Goo: 989, Exc: 107, Fai: 35 |
| Exter_Cond | 0 | 1 | TRUE | 5 | Typ: 2549, Goo: 299, Fai: 67, Exc: 12 |
| Foundation | 0 | 1 | FALSE | 6 | PCo: 1310, CBl: 1244, Brk: 311, Sla: 49 |
| Bsmt_Qual | 0 | 1 | TRUE | 6 | Typ: 1283, Goo: 1219, Exc: 258, Fai: 88 |
| Bsmt_Cond | 0 | 1 | TRUE | 6 | Typ: 2616, Goo: 122, Fai: 104, No_: 80 |
| Bsmt_Exposure | 0 | 1 | TRUE | 5 | No: 1906, Av: 418, Gd: 284, Mn: 239 |
| BsmtFin_Type_1 | 0 | 1 | TRUE | 7 | GLQ: 859, Unf: 851, ALQ: 429, Rec: 288 |
| BsmtFin_Type_2 | 0 | 1 | TRUE | 7 | Unf: 2499, Rec: 106, LwQ: 89, No_: 81 |
| Heating | 0 | 1 | FALSE | 6 | Gas: 2885, Gas: 27, Gra: 9, Wal: 6 |
| Heating_QC | 0 | 1 | TRUE | 5 | Exc: 1495, Typ: 864, Goo: 476, Fai: 92 |
| Central_Air | 0 | 1 | FALSE | 2 | Y: 2734, N: 196 |
| Electrical | 1 | 1 | TRUE | 5 | SBr: 2682, Fus: 188, Fus: 50, Fus: 8 |
| Kitchen_Qual | 0 | 1 | TRUE | 5 | Typ: 1494, Goo: 1160, Exc: 205, Fai: 70 |
| Functional | 0 | 1 | TRUE | 8 | Typ: 2728, Min: 70, Min: 65, Mod: 35 |
| Fireplace_Qu | 0 | 1 | TRUE | 6 | No_: 1422, Goo: 744, Typ: 600, Fai: 75 |
| Garage_Type | 0 | 1 | FALSE | 7 | Att: 1731, Det: 782, Bui: 186, No_: 157 |
| Garage_Finish | 0 | 1 | TRUE | 4 | Unf: 1231, RFn: 812, Fin: 728, No_: 159 |
| Garage_Qual | 0 | 1 | TRUE | 6 | Typ: 2615, No_: 159, Fai: 124, Goo: 24 |
| Garage_Cond | 0 | 1 | TRUE | 6 | Typ: 2665, No_: 159, Fai: 74, Goo: 15 |
| Paved_Drive | 0 | 1 | TRUE | 3 | Pav: 2652, Dir: 216, Par: 62 |
| Pool_QC | 0 | 1 | TRUE | 5 | No_: 2917, Goo: 4, Exc: 4, Typ: 3 |
| Fence | 0 | 1 | TRUE | 5 | No_: 2358, Min: 330, Goo: 118, Goo: 112 |
| Misc_Feature | 0 | 1 | FALSE | 6 | Non: 2824, She: 95, Gar: 5, Oth: 4 |
| Sale_Type | 0 | 1 | FALSE | 10 | WD : 2536, New: 239, COD: 87, Con: 26 |
| Sale_Condition | 0 | 1 | FALSE | 6 | Nor: 2413, Par: 245, Abn: 190, Fam: 46 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Lot_Frontage | 0 | 1 | 57.65 | 33.50 | 0.00 | 43.00 | 63.00 | 78.00 | 313.00 | ▇▇▁▁▁ |
| Lot_Area | 0 | 1 | 10147.92 | 7880.02 | 1300.00 | 7440.25 | 9436.50 | 11555.25 | 215245.00 | ▇▁▁▁▁ |
| Year_Built | 0 | 1 | 1971.36 | 30.25 | 1872.00 | 1954.00 | 1973.00 | 2001.00 | 2010.00 | ▁▂▃▆▇ |
| Year_Remod_Add | 0 | 1 | 1984.27 | 20.86 | 1950.00 | 1965.00 | 1993.00 | 2004.00 | 2010.00 | ▅▂▂▃▇ |
| Mas_Vnr_Area | 0 | 1 | 101.10 | 178.63 | 0.00 | 0.00 | 0.00 | 162.75 | 1600.00 | ▇▁▁▁▁ |
| BsmtFin_SF_1 | 0 | 1 | 4.18 | 2.23 | 0.00 | 3.00 | 3.00 | 7.00 | 7.00 | ▃▂▇▁▇ |
| BsmtFin_SF_2 | 0 | 1 | 49.71 | 169.14 | 0.00 | 0.00 | 0.00 | 0.00 | 1526.00 | ▇▁▁▁▁ |
| Bsmt_Unf_SF | 0 | 1 | 559.07 | 439.54 | 0.00 | 219.00 | 465.50 | 801.75 | 2336.00 | ▇▅▂▁▁ |
| Total_Bsmt_SF | 0 | 1 | 1051.26 | 440.97 | 0.00 | 793.00 | 990.00 | 1301.50 | 6110.00 | ▇▃▁▁▁ |
| First_Flr_SF | 0 | 1 | 1159.56 | 391.89 | 334.00 | 876.25 | 1084.00 | 1384.00 | 5095.00 | ▇▃▁▁▁ |
| Second_Flr_SF | 0 | 1 | 335.46 | 428.40 | 0.00 | 0.00 | 0.00 | 703.75 | 2065.00 | ▇▃▂▁▁ |
| Low_Qual_Fin_SF | 0 | 1 | 4.68 | 46.31 | 0.00 | 0.00 | 0.00 | 0.00 | 1064.00 | ▇▁▁▁▁ |
| Gr_Liv_Area | 0 | 1 | 1499.69 | 505.51 | 334.00 | 1126.00 | 1442.00 | 1742.75 | 5642.00 | ▇▇▁▁▁ |
| Bsmt_Full_Bath | 0 | 1 | 0.43 | 0.52 | 0.00 | 0.00 | 0.00 | 1.00 | 3.00 | ▇▆▁▁▁ |
| Bsmt_Half_Bath | 0 | 1 | 0.06 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | ▇▁▁▁▁ |
| Full_Bath | 0 | 1 | 1.57 | 0.55 | 0.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▇▁▁ |
| Half_Bath | 0 | 1 | 0.38 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 2.00 | ▇▁▅▁▁ |
| Bedroom_AbvGr | 0 | 1 | 2.85 | 0.83 | 0.00 | 2.00 | 3.00 | 3.00 | 8.00 | ▁▇▂▁▁ |
| Kitchen_AbvGr | 0 | 1 | 1.04 | 0.21 | 0.00 | 1.00 | 1.00 | 1.00 | 3.00 | ▁▇▁▁▁ |
| TotRms_AbvGrd | 0 | 1 | 6.44 | 1.57 | 2.00 | 5.00 | 6.00 | 7.00 | 15.00 | ▁▇▂▁▁ |
| Fireplaces | 0 | 1 | 0.60 | 0.65 | 0.00 | 0.00 | 1.00 | 1.00 | 4.00 | ▇▇▁▁▁ |
| Garage_Cars | 0 | 1 | 1.77 | 0.76 | 0.00 | 1.00 | 2.00 | 2.00 | 5.00 | ▅▇▂▁▁ |
| Garage_Area | 0 | 1 | 472.66 | 215.19 | 0.00 | 320.00 | 480.00 | 576.00 | 1488.00 | ▃▇▃▁▁ |
| Wood_Deck_SF | 0 | 1 | 93.75 | 126.36 | 0.00 | 0.00 | 0.00 | 168.00 | 1424.00 | ▇▁▁▁▁ |
| Open_Porch_SF | 0 | 1 | 47.53 | 67.48 | 0.00 | 0.00 | 27.00 | 70.00 | 742.00 | ▇▁▁▁▁ |
| Enclosed_Porch | 0 | 1 | 23.01 | 64.14 | 0.00 | 0.00 | 0.00 | 0.00 | 1012.00 | ▇▁▁▁▁ |
| Three_season_porch | 0 | 1 | 2.59 | 25.14 | 0.00 | 0.00 | 0.00 | 0.00 | 508.00 | ▇▁▁▁▁ |
| Screen_Porch | 0 | 1 | 16.00 | 56.09 | 0.00 | 0.00 | 0.00 | 0.00 | 576.00 | ▇▁▁▁▁ |
| Pool_Area | 0 | 1 | 2.24 | 35.60 | 0.00 | 0.00 | 0.00 | 0.00 | 800.00 | ▇▁▁▁▁ |
| Misc_Val | 0 | 1 | 50.64 | 566.34 | 0.00 | 0.00 | 0.00 | 0.00 | 17000.00 | ▇▁▁▁▁ |
| Mo_Sold | 0 | 1 | 6.22 | 2.71 | 1.00 | 4.00 | 6.00 | 8.00 | 12.00 | ▅▆▇▃▃ |
| Year_Sold | 0 | 1 | 2007.79 | 1.32 | 2006.00 | 2007.00 | 2008.00 | 2009.00 | 2010.00 | ▇▇▇▇▃ |
| Sale_Price | 0 | 1 | 180796.06 | 79886.69 | 12789.00 | 129500.00 | 160000.00 | 213500.00 | 755000.00 | ▇▇▁▁▁ |
| Longitude | 0 | 1 | -93.64 | 0.03 | -93.69 | -93.66 | -93.64 | -93.62 | -93.58 | ▅▅▇▆▁ |
| Latitude | 0 | 1 | 42.03 | 0.02 | 41.99 | 42.02 | 42.03 | 42.05 | 42.06 | ▂▂▇▇▇ |
#?ames_raw
Aparentemente existe um valor faltando na coluna Electrical. Vamos carregar a base make_ames e verificar se esse valor está faltando também, novamnte com skim.
dados2 <- make_ames()
skim(dados2)
| Name | dados2 |
| Number of rows | 2930 |
| Number of columns | 81 |
| _______________________ | |
| Column type frequency: | |
| factor | 46 |
| numeric | 35 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| MS_SubClass | 0 | 1 | FALSE | 16 | One: 1079, Two: 575, One: 287, One: 192 |
| MS_Zoning | 0 | 1 | FALSE | 7 | Res: 2273, Res: 462, Flo: 139, Res: 27 |
| Street | 0 | 1 | FALSE | 2 | Pav: 2918, Grv: 12 |
| Alley | 0 | 1 | FALSE | 3 | No_: 2732, Gra: 120, Pav: 78 |
| Lot_Shape | 0 | 1 | FALSE | 4 | Reg: 1859, Sli: 979, Mod: 76, Irr: 16 |
| Land_Contour | 0 | 1 | FALSE | 4 | Lvl: 2633, HLS: 120, Bnk: 117, Low: 60 |
| Utilities | 0 | 1 | FALSE | 3 | All: 2927, NoS: 2, NoS: 1 |
| Lot_Config | 0 | 1 | FALSE | 5 | Ins: 2140, Cor: 511, Cul: 180, FR2: 85 |
| Land_Slope | 0 | 1 | FALSE | 3 | Gtl: 2789, Mod: 125, Sev: 16 |
| Neighborhood | 0 | 1 | FALSE | 28 | Nor: 443, Col: 267, Old: 239, Edw: 194 |
| Condition_1 | 0 | 1 | FALSE | 9 | Nor: 2522, Fee: 164, Art: 92, RRA: 50 |
| Condition_2 | 0 | 1 | FALSE | 8 | Nor: 2900, Fee: 13, Art: 5, Pos: 4 |
| Bldg_Type | 0 | 1 | FALSE | 5 | One: 2425, Twn: 233, Dup: 109, Twn: 101 |
| House_Style | 0 | 1 | FALSE | 8 | One: 1481, Two: 873, One: 314, SLv: 128 |
| Overall_Qual | 0 | 1 | FALSE | 10 | Ave: 825, Abo: 732, Goo: 602, Ver: 350 |
| Overall_Cond | 0 | 1 | FALSE | 9 | Ave: 1654, Abo: 533, Goo: 390, Ver: 144 |
| Roof_Style | 0 | 1 | FALSE | 6 | Gab: 2321, Hip: 551, Gam: 22, Fla: 20 |
| Roof_Matl | 0 | 1 | FALSE | 8 | Com: 2887, Tar: 23, WdS: 9, WdS: 7 |
| Exterior_1st | 0 | 1 | FALSE | 16 | Vin: 1026, Met: 450, HdB: 442, Wd : 420 |
| Exterior_2nd | 0 | 1 | FALSE | 17 | Vin: 1015, Met: 447, HdB: 406, Wd : 397 |
| Mas_Vnr_Type | 0 | 1 | FALSE | 5 | Non: 1775, Brk: 880, Sto: 249, Brk: 25 |
| Exter_Qual | 0 | 1 | FALSE | 4 | Typ: 1799, Goo: 989, Exc: 107, Fai: 35 |
| Exter_Cond | 0 | 1 | FALSE | 5 | Typ: 2549, Goo: 299, Fai: 67, Exc: 12 |
| Foundation | 0 | 1 | FALSE | 6 | PCo: 1310, CBl: 1244, Brk: 311, Sla: 49 |
| Bsmt_Qual | 0 | 1 | FALSE | 6 | Typ: 1283, Goo: 1219, Exc: 258, Fai: 88 |
| Bsmt_Cond | 0 | 1 | FALSE | 6 | Typ: 2616, Goo: 122, Fai: 104, No_: 80 |
| Bsmt_Exposure | 0 | 1 | FALSE | 5 | No: 1906, Av: 418, Gd: 284, Mn: 239 |
| BsmtFin_Type_1 | 0 | 1 | FALSE | 7 | GLQ: 859, Unf: 851, ALQ: 429, Rec: 288 |
| BsmtFin_Type_2 | 0 | 1 | FALSE | 7 | Unf: 2499, Rec: 106, LwQ: 89, No_: 81 |
| Heating | 0 | 1 | FALSE | 6 | Gas: 2885, Gas: 27, Gra: 9, Wal: 6 |
| Heating_QC | 0 | 1 | FALSE | 5 | Exc: 1495, Typ: 864, Goo: 476, Fai: 92 |
| Central_Air | 0 | 1 | FALSE | 2 | Y: 2734, N: 196 |
| Electrical | 0 | 1 | FALSE | 6 | SBr: 2682, Fus: 188, Fus: 50, Fus: 8 |
| Kitchen_Qual | 0 | 1 | FALSE | 5 | Typ: 1494, Goo: 1160, Exc: 205, Fai: 70 |
| Functional | 0 | 1 | FALSE | 8 | Typ: 2728, Min: 70, Min: 65, Mod: 35 |
| Fireplace_Qu | 0 | 1 | FALSE | 6 | No_: 1422, Goo: 744, Typ: 600, Fai: 75 |
| Garage_Type | 0 | 1 | FALSE | 7 | Att: 1731, Det: 782, Bui: 186, No_: 157 |
| Garage_Finish | 0 | 1 | FALSE | 4 | Unf: 1231, RFn: 812, Fin: 728, No_: 159 |
| Garage_Qual | 0 | 1 | FALSE | 6 | Typ: 2615, No_: 159, Fai: 124, Goo: 24 |
| Garage_Cond | 0 | 1 | FALSE | 6 | Typ: 2665, No_: 159, Fai: 74, Goo: 15 |
| Paved_Drive | 0 | 1 | FALSE | 3 | Pav: 2652, Dir: 216, Par: 62 |
| Pool_QC | 0 | 1 | FALSE | 5 | No_: 2917, Exc: 4, Goo: 4, Typ: 3 |
| Fence | 0 | 1 | FALSE | 5 | No_: 2358, Min: 330, Goo: 118, Goo: 112 |
| Misc_Feature | 0 | 1 | FALSE | 6 | Non: 2824, She: 95, Gar: 5, Oth: 4 |
| Sale_Type | 0 | 1 | FALSE | 10 | WD : 2536, New: 239, COD: 87, Con: 26 |
| Sale_Condition | 0 | 1 | FALSE | 6 | Nor: 2413, Par: 245, Abn: 190, Fam: 46 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Lot_Frontage | 0 | 1 | 57.65 | 33.50 | 0.00 | 43.00 | 63.00 | 78.00 | 313.00 | ▇▇▁▁▁ |
| Lot_Area | 0 | 1 | 10147.92 | 7880.02 | 1300.00 | 7440.25 | 9436.50 | 11555.25 | 215245.00 | ▇▁▁▁▁ |
| Year_Built | 0 | 1 | 1971.36 | 30.25 | 1872.00 | 1954.00 | 1973.00 | 2001.00 | 2010.00 | ▁▂▃▆▇ |
| Year_Remod_Add | 0 | 1 | 1984.27 | 20.86 | 1950.00 | 1965.00 | 1993.00 | 2004.00 | 2010.00 | ▅▂▂▃▇ |
| Mas_Vnr_Area | 0 | 1 | 101.10 | 178.63 | 0.00 | 0.00 | 0.00 | 162.75 | 1600.00 | ▇▁▁▁▁ |
| BsmtFin_SF_1 | 0 | 1 | 4.18 | 2.23 | 0.00 | 3.00 | 3.00 | 7.00 | 7.00 | ▃▂▇▁▇ |
| BsmtFin_SF_2 | 0 | 1 | 49.71 | 169.14 | 0.00 | 0.00 | 0.00 | 0.00 | 1526.00 | ▇▁▁▁▁ |
| Bsmt_Unf_SF | 0 | 1 | 559.07 | 439.54 | 0.00 | 219.00 | 465.50 | 801.75 | 2336.00 | ▇▅▂▁▁ |
| Total_Bsmt_SF | 0 | 1 | 1051.26 | 440.97 | 0.00 | 793.00 | 990.00 | 1301.50 | 6110.00 | ▇▃▁▁▁ |
| First_Flr_SF | 0 | 1 | 1159.56 | 391.89 | 334.00 | 876.25 | 1084.00 | 1384.00 | 5095.00 | ▇▃▁▁▁ |
| Second_Flr_SF | 0 | 1 | 335.46 | 428.40 | 0.00 | 0.00 | 0.00 | 703.75 | 2065.00 | ▇▃▂▁▁ |
| Low_Qual_Fin_SF | 0 | 1 | 4.68 | 46.31 | 0.00 | 0.00 | 0.00 | 0.00 | 1064.00 | ▇▁▁▁▁ |
| Gr_Liv_Area | 0 | 1 | 1499.69 | 505.51 | 334.00 | 1126.00 | 1442.00 | 1742.75 | 5642.00 | ▇▇▁▁▁ |
| Bsmt_Full_Bath | 0 | 1 | 0.43 | 0.52 | 0.00 | 0.00 | 0.00 | 1.00 | 3.00 | ▇▆▁▁▁ |
| Bsmt_Half_Bath | 0 | 1 | 0.06 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | ▇▁▁▁▁ |
| Full_Bath | 0 | 1 | 1.57 | 0.55 | 0.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▇▁▁ |
| Half_Bath | 0 | 1 | 0.38 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 2.00 | ▇▁▅▁▁ |
| Bedroom_AbvGr | 0 | 1 | 2.85 | 0.83 | 0.00 | 2.00 | 3.00 | 3.00 | 8.00 | ▁▇▂▁▁ |
| Kitchen_AbvGr | 0 | 1 | 1.04 | 0.21 | 0.00 | 1.00 | 1.00 | 1.00 | 3.00 | ▁▇▁▁▁ |
| TotRms_AbvGrd | 0 | 1 | 6.44 | 1.57 | 2.00 | 5.00 | 6.00 | 7.00 | 15.00 | ▁▇▂▁▁ |
| Fireplaces | 0 | 1 | 0.60 | 0.65 | 0.00 | 0.00 | 1.00 | 1.00 | 4.00 | ▇▇▁▁▁ |
| Garage_Cars | 0 | 1 | 1.77 | 0.76 | 0.00 | 1.00 | 2.00 | 2.00 | 5.00 | ▅▇▂▁▁ |
| Garage_Area | 0 | 1 | 472.66 | 215.19 | 0.00 | 320.00 | 480.00 | 576.00 | 1488.00 | ▃▇▃▁▁ |
| Wood_Deck_SF | 0 | 1 | 93.75 | 126.36 | 0.00 | 0.00 | 0.00 | 168.00 | 1424.00 | ▇▁▁▁▁ |
| Open_Porch_SF | 0 | 1 | 47.53 | 67.48 | 0.00 | 0.00 | 27.00 | 70.00 | 742.00 | ▇▁▁▁▁ |
| Enclosed_Porch | 0 | 1 | 23.01 | 64.14 | 0.00 | 0.00 | 0.00 | 0.00 | 1012.00 | ▇▁▁▁▁ |
| Three_season_porch | 0 | 1 | 2.59 | 25.14 | 0.00 | 0.00 | 0.00 | 0.00 | 508.00 | ▇▁▁▁▁ |
| Screen_Porch | 0 | 1 | 16.00 | 56.09 | 0.00 | 0.00 | 0.00 | 0.00 | 576.00 | ▇▁▁▁▁ |
| Pool_Area | 0 | 1 | 2.24 | 35.60 | 0.00 | 0.00 | 0.00 | 0.00 | 800.00 | ▇▁▁▁▁ |
| Misc_Val | 0 | 1 | 50.64 | 566.34 | 0.00 | 0.00 | 0.00 | 0.00 | 17000.00 | ▇▁▁▁▁ |
| Mo_Sold | 0 | 1 | 6.22 | 2.71 | 1.00 | 4.00 | 6.00 | 8.00 | 12.00 | ▅▆▇▃▃ |
| Year_Sold | 0 | 1 | 2007.79 | 1.32 | 2006.00 | 2007.00 | 2008.00 | 2009.00 | 2010.00 | ▇▇▇▇▃ |
| Sale_Price | 0 | 1 | 180796.06 | 79886.69 | 12789.00 | 129500.00 | 160000.00 | 213500.00 | 755000.00 | ▇▇▁▁▁ |
| Longitude | 0 | 1 | -93.64 | 0.03 | -93.69 | -93.66 | -93.64 | -93.62 | -93.58 | ▅▅▇▆▁ |
| Latitude | 0 | 1 | 42.03 | 0.02 | 41.99 | 42.02 | 42.03 | 42.05 | 42.06 | ▂▂▇▇▇ |
Pelo visto não há nenhum valor faltando nessa coluna. Vamos olhar mais de perto. Primeiro encontramos a linha com valor faltando na base original, e depois procuramos o valor dessa linha na nova base.
row_na <- dados %>%
rowid_to_column() %>%
filter(is.na(dados$Electrical)) %>%
select(rowid)
(dados$Electrical[row_na$rowid])
## [1] <NA>
## Levels: Mix < FuseP < FuseF < FuseA < SBrkr
(dados2$Electrical[row_na$rowid])
## [1] Unknown
## Levels: FuseA FuseF FuseP Mix SBrkr Unknown
O valor é “Unknown”, assim como na base ordenada. Como em ambas não há a informação de “Electrical” em apenas uma linha, vamos utilizar dropna() na base ordenada e seguir com a análise.
dados <- dados %>%
drop_na()
skim(dados)
| Name | dados |
| Number of rows | 2929 |
| Number of columns | 81 |
| _______________________ | |
| Column type frequency: | |
| factor | 46 |
| numeric | 35 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| MS_SubClass | 0 | 1 | FALSE | 16 | One: 1079, Two: 575, One: 287, One: 192 |
| MS_Zoning | 0 | 1 | FALSE | 7 | Res: 2272, Res: 462, Flo: 139, Res: 27 |
| Street | 0 | 1 | FALSE | 2 | Pav: 2917, Grv: 12 |
| Alley | 0 | 1 | FALSE | 3 | No_: 2731, Gra: 120, Pav: 78 |
| Lot_Shape | 0 | 1 | TRUE | 4 | Reg: 1858, Sli: 979, Mod: 76, Irr: 16 |
| Land_Contour | 0 | 1 | TRUE | 4 | Lvl: 2632, HLS: 120, Bnk: 117, Low: 60 |
| Utilities | 0 | 1 | TRUE | 3 | All: 2926, NoS: 2, NoS: 1, ELO: 0 |
| Lot_Config | 0 | 1 | FALSE | 5 | Ins: 2139, Cor: 511, Cul: 180, FR2: 85 |
| Land_Slope | 0 | 1 | TRUE | 3 | Gtl: 2788, Mod: 125, Sev: 16 |
| Neighborhood | 0 | 1 | FALSE | 28 | Nor: 443, Col: 267, Old: 239, Edw: 194 |
| Condition_1 | 0 | 1 | FALSE | 9 | Nor: 2521, Fee: 164, Art: 92, RRA: 50 |
| Condition_2 | 0 | 1 | FALSE | 8 | Nor: 2899, Fee: 13, Art: 5, Pos: 4 |
| Bldg_Type | 0 | 1 | FALSE | 5 | One: 2424, Twn: 233, Dup: 109, Twn: 101 |
| House_Style | 0 | 1 | FALSE | 8 | One: 1481, Two: 873, One: 314, SLv: 127 |
| Overall_Qual | 0 | 1 | TRUE | 10 | Ave: 824, Abo: 732, Goo: 602, Ver: 350 |
| Overall_Cond | 0 | 1 | TRUE | 9 | Ave: 1653, Abo: 533, Goo: 390, Ver: 144 |
| Roof_Style | 0 | 1 | FALSE | 6 | Gab: 2320, Hip: 551, Gam: 22, Fla: 20 |
| Roof_Matl | 0 | 1 | FALSE | 8 | Com: 2886, Tar: 23, WdS: 9, WdS: 7 |
| Exterior_1st | 0 | 1 | FALSE | 16 | Vin: 1025, Met: 450, HdB: 442, Wd : 420 |
| Exterior_2nd | 0 | 1 | FALSE | 17 | Vin: 1014, Met: 447, HdB: 406, Wd : 397 |
| Mas_Vnr_Type | 0 | 1 | FALSE | 5 | Non: 1774, Brk: 880, Sto: 249, Brk: 25 |
| Exter_Qual | 0 | 1 | TRUE | 4 | Typ: 1798, Goo: 989, Exc: 107, Fai: 35 |
| Exter_Cond | 0 | 1 | TRUE | 5 | Typ: 2548, Goo: 299, Fai: 67, Exc: 12 |
| Foundation | 0 | 1 | FALSE | 6 | PCo: 1309, CBl: 1244, Brk: 311, Sla: 49 |
| Bsmt_Qual | 0 | 1 | TRUE | 6 | Typ: 1283, Goo: 1218, Exc: 258, Fai: 88 |
| Bsmt_Cond | 0 | 1 | TRUE | 6 | Typ: 2615, Goo: 122, Fai: 104, No_: 80 |
| Bsmt_Exposure | 0 | 1 | TRUE | 5 | No: 1905, Av: 418, Gd: 284, Mn: 239 |
| BsmtFin_Type_1 | 0 | 1 | TRUE | 7 | GLQ: 859, Unf: 850, ALQ: 429, Rec: 288 |
| BsmtFin_Type_2 | 0 | 1 | TRUE | 7 | Unf: 2498, Rec: 106, LwQ: 89, No_: 81 |
| Heating | 0 | 1 | FALSE | 6 | Gas: 2884, Gas: 27, Gra: 9, Wal: 6 |
| Heating_QC | 0 | 1 | TRUE | 5 | Exc: 1495, Typ: 864, Goo: 475, Fai: 92 |
| Central_Air | 0 | 1 | FALSE | 2 | Y: 2733, N: 196 |
| Electrical | 0 | 1 | TRUE | 5 | SBr: 2682, Fus: 188, Fus: 50, Fus: 8 |
| Kitchen_Qual | 0 | 1 | TRUE | 5 | Typ: 1494, Goo: 1159, Exc: 205, Fai: 70 |
| Functional | 0 | 1 | TRUE | 8 | Typ: 2727, Min: 70, Min: 65, Mod: 35 |
| Fireplace_Qu | 0 | 1 | TRUE | 6 | No_: 1421, Goo: 744, Typ: 600, Fai: 75 |
| Garage_Type | 0 | 1 | FALSE | 7 | Att: 1731, Det: 782, Bui: 185, No_: 157 |
| Garage_Finish | 0 | 1 | TRUE | 4 | Unf: 1231, RFn: 812, Fin: 727, No_: 159 |
| Garage_Qual | 0 | 1 | TRUE | 6 | Typ: 2614, No_: 159, Fai: 124, Goo: 24 |
| Garage_Cond | 0 | 1 | TRUE | 6 | Typ: 2664, No_: 159, Fai: 74, Goo: 15 |
| Paved_Drive | 0 | 1 | TRUE | 3 | Pav: 2651, Dir: 216, Par: 62 |
| Pool_QC | 0 | 1 | TRUE | 5 | No_: 2916, Goo: 4, Exc: 4, Typ: 3 |
| Fence | 0 | 1 | TRUE | 5 | No_: 2357, Min: 330, Goo: 118, Goo: 112 |
| Misc_Feature | 0 | 1 | FALSE | 6 | Non: 2823, She: 95, Gar: 5, Oth: 4 |
| Sale_Type | 0 | 1 | FALSE | 10 | WD : 2535, New: 239, COD: 87, Con: 26 |
| Sale_Condition | 0 | 1 | FALSE | 6 | Nor: 2412, Par: 245, Abn: 190, Fam: 46 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Lot_Frontage | 0 | 1 | 57.64 | 33.50 | 0.00 | 43.00 | 63.00 | 78.00 | 313.00 | ▇▇▁▁▁ |
| Lot_Area | 0 | 1 | 10148.06 | 7881.36 | 1300.00 | 7440.00 | 9434.00 | 11556.00 | 215245.00 | ▇▁▁▁▁ |
| Year_Built | 0 | 1 | 1971.34 | 30.24 | 1872.00 | 1954.00 | 1973.00 | 2001.00 | 2010.00 | ▁▂▃▆▇ |
| Year_Remod_Add | 0 | 1 | 1984.26 | 20.86 | 1950.00 | 1965.00 | 1993.00 | 2004.00 | 2010.00 | ▅▂▂▃▇ |
| Mas_Vnr_Area | 0 | 1 | 101.13 | 178.66 | 0.00 | 0.00 | 0.00 | 163.00 | 1600.00 | ▇▁▁▁▁ |
| BsmtFin_SF_1 | 0 | 1 | 4.18 | 2.23 | 0.00 | 3.00 | 3.00 | 7.00 | 7.00 | ▃▂▇▁▇ |
| BsmtFin_SF_2 | 0 | 1 | 49.72 | 169.17 | 0.00 | 0.00 | 0.00 | 0.00 | 1526.00 | ▇▁▁▁▁ |
| Bsmt_Unf_SF | 0 | 1 | 559.13 | 439.60 | 0.00 | 219.00 | 466.00 | 802.00 | 2336.00 | ▇▅▂▁▁ |
| Total_Bsmt_SF | 0 | 1 | 1051.48 | 440.87 | 0.00 | 793.00 | 990.00 | 1302.00 | 6110.00 | ▇▃▁▁▁ |
| First_Flr_SF | 0 | 1 | 1159.70 | 391.89 | 334.00 | 877.00 | 1084.00 | 1384.00 | 5095.00 | ▇▃▁▁▁ |
| Second_Flr_SF | 0 | 1 | 335.35 | 428.43 | 0.00 | 0.00 | 0.00 | 704.00 | 2065.00 | ▇▃▂▁▁ |
| Low_Qual_Fin_SF | 0 | 1 | 4.68 | 46.32 | 0.00 | 0.00 | 0.00 | 0.00 | 1064.00 | ▇▁▁▁▁ |
| Gr_Liv_Area | 0 | 1 | 1499.73 | 505.59 | 334.00 | 1126.00 | 1442.00 | 1743.00 | 5642.00 | ▇▇▁▁▁ |
| Bsmt_Full_Bath | 0 | 1 | 0.43 | 0.52 | 0.00 | 0.00 | 0.00 | 1.00 | 3.00 | ▇▆▁▁▁ |
| Bsmt_Half_Bath | 0 | 1 | 0.06 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 | ▇▁▁▁▁ |
| Full_Bath | 0 | 1 | 1.57 | 0.55 | 0.00 | 1.00 | 2.00 | 2.00 | 4.00 | ▁▇▇▁▁ |
| Half_Bath | 0 | 1 | 0.38 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 2.00 | ▇▁▅▁▁ |
| Bedroom_AbvGr | 0 | 1 | 2.85 | 0.83 | 0.00 | 2.00 | 3.00 | 3.00 | 8.00 | ▁▇▂▁▁ |
| Kitchen_AbvGr | 0 | 1 | 1.04 | 0.21 | 0.00 | 1.00 | 1.00 | 1.00 | 3.00 | ▁▇▁▁▁ |
| TotRms_AbvGrd | 0 | 1 | 6.44 | 1.57 | 2.00 | 5.00 | 6.00 | 7.00 | 15.00 | ▁▇▂▁▁ |
| Fireplaces | 0 | 1 | 0.60 | 0.65 | 0.00 | 0.00 | 1.00 | 1.00 | 4.00 | ▇▇▁▁▁ |
| Garage_Cars | 0 | 1 | 1.77 | 0.76 | 0.00 | 1.00 | 2.00 | 2.00 | 5.00 | ▅▇▂▁▁ |
| Garage_Area | 0 | 1 | 472.68 | 215.22 | 0.00 | 320.00 | 480.00 | 576.00 | 1488.00 | ▃▇▃▁▁ |
| Wood_Deck_SF | 0 | 1 | 93.75 | 126.38 | 0.00 | 0.00 | 0.00 | 168.00 | 1424.00 | ▇▁▁▁▁ |
| Open_Porch_SF | 0 | 1 | 47.55 | 67.49 | 0.00 | 0.00 | 27.00 | 70.00 | 742.00 | ▇▁▁▁▁ |
| Enclosed_Porch | 0 | 1 | 23.02 | 64.15 | 0.00 | 0.00 | 0.00 | 0.00 | 1012.00 | ▇▁▁▁▁ |
| Three_season_porch | 0 | 1 | 2.59 | 25.15 | 0.00 | 0.00 | 0.00 | 0.00 | 508.00 | ▇▁▁▁▁ |
| Screen_Porch | 0 | 1 | 16.01 | 56.10 | 0.00 | 0.00 | 0.00 | 0.00 | 576.00 | ▇▁▁▁▁ |
| Pool_Area | 0 | 1 | 2.24 | 35.60 | 0.00 | 0.00 | 0.00 | 0.00 | 800.00 | ▇▁▁▁▁ |
| Misc_Val | 0 | 1 | 50.65 | 566.44 | 0.00 | 0.00 | 0.00 | 0.00 | 17000.00 | ▇▁▁▁▁ |
| Mo_Sold | 0 | 1 | 6.22 | 2.71 | 1.00 | 4.00 | 6.00 | 8.00 | 12.00 | ▅▆▇▃▃ |
| Year_Sold | 0 | 1 | 2007.79 | 1.32 | 2006.00 | 2007.00 | 2008.00 | 2009.00 | 2010.00 | ▇▇▇▇▃ |
| Sale_Price | 0 | 1 | 180800.60 | 79899.96 | 12789.00 | 129500.00 | 160000.00 | 213500.00 | 755000.00 | ▇▇▁▁▁ |
| Longitude | 0 | 1 | -93.64 | 0.03 | -93.69 | -93.66 | -93.64 | -93.62 | -93.58 | ▅▅▇▆▁ |
| Latitude | 0 | 1 | 42.03 | 0.02 | 41.99 | 42.02 | 42.03 | 42.05 | 42.06 | ▂▂▇▇▇ |
Para entender melhor o compartamento dos dados, será feita uma análise exploratória da base buscando relações com a variável predita “Sale_Price”.
Iniciamos a análise entendendo o perfil dessa variável:
dados %>%
ggplot(aes(Longitude, Latitude, color = Sale_Price)) +
geom_point(size = 0.5, alpha = 0.7) +
scale_color_gradient(low = 'skyblue', high = 'darkred', labels = scales::label_number_si())
Verifica-se uma concentração de casas mais caras principalmente no norte da região analisada. Provavelmente ‘Neighborhood’ (bairro), assim como Latitude e Longitude, devem ter uma correlação alta com o preço.
dados %>%
ggplot(aes(Neighborhood, Sale_Price, fill = Neighborhood)) +
geom_boxplot(show.legend = FALSE)+
scale_y_continuous(labels = scales::label_number_si())+
coord_flip()
dados %>%
ggplot(aes(x=Sale_Price)) +
geom_histogram(fill="skyblue", binwidth = 10000) +
scale_x_continuous(breaks= seq(0, 800000, by=100000), labels = scales::label_number_si())
(summary(dados$Sale_Price))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12789 129500 160000 180801 213500 755000
No histograma observa-se uma concentração mais perto do começo da amostra. Isso é esperado, pois menos pessoas conseguem comprar casas mais caras. Importante notar que, apesar disso, a média e mediana são próximas.
Nesse item o objetivo é ter uma ideia de quais variáveis estão mais correlacionadas com “Sale_Price” para ter uma ideia do preditores e para nortear o restante da análise exploratória.
#variáveis numéricas
num_vars <- which(sapply(dados, is.numeric))
#correlação entre todas as variáveis
corr_all <- cor(dados[,num_vars], use="pairwise.complete.obs")
#correlação com Sale_Price em ordem decrescente
(corr_sorted <- as.matrix(sort(corr_all[,'Sale_Price'], decreasing = TRUE)))
## [,1]
## Sale_Price 1.00000000
## Gr_Liv_Area 0.70677666
## Garage_Cars 0.64759257
## Garage_Area 0.64013460
## Total_Bsmt_SF 0.63269326
## First_Flr_SF 0.62173389
## Year_Built 0.55861903
## Full_Bath 0.54570831
## Year_Remod_Add 0.53314636
## Mas_Vnr_Area 0.50219365
## TotRms_AbvGrd 0.49550750
## Fireplaces 0.47457710
## Wood_Deck_SF 0.32714767
## Open_Porch_SF 0.31293846
## Latitude 0.29101035
## Half_Bath 0.28520178
## Bsmt_Full_Bath 0.27580905
## Second_Flr_SF 0.26943829
## Lot_Area 0.26654763
## Lot_Frontage 0.20190876
## Bsmt_Unf_SF 0.18329078
## Bedroom_AbvGr 0.14392488
## Screen_Porch 0.11213709
## Pool_Area 0.06840003
## Mo_Sold 0.03523475
## Three_season_porch 0.03221900
## BsmtFin_SF_2 0.00600098
## Misc_Val -0.01569664
## Year_Sold -0.03056032
## Bsmt_Half_Bath -0.03583132
## Low_Qual_Fin_SF -0.03766575
## Kitchen_AbvGr -0.11982695
## Enclosed_Porch -0.12881128
## BsmtFin_SF_1 -0.13487107
## Longitude -0.25141580
#correlação alta com Sale_Price
corr_high <- names(which(apply(corr_sorted, 1, function(x) abs(x)>0.5)))
dados[,corr_high] %>%
ggpairs()
dados %>%
ggplot(aes(Year_Built, Sale_Price, size = Lot_Area/1000, color = Foundation)) +
geom_point(alpha = 0.5)+
scale_y_continuous(labels = scales::label_number_si())
O mais marcante nesse gráfico é a mudança do material utilizado na estrutura das casas ao longo dos anos. A relação entre ano de construção e preço é também confirmada, visto que ‘Year_Built’ é umas das variáveis numéricas com correlação maior do que 0.5.
No topo da lista de variáveis correlacionadas está a a área da garagem e quantidade de carros. Vamos observar o comportamento destas e outras variáveis relacionadas.
dados %>%
ggplot(aes(Garage_Area, Sale_Price, size = Garage_Cars, color = Garage_Type)) +
geom_point(alpha = 0.5)+
scale_y_continuous(labels = scales::label_number_si())
dados %>%
ggplot(aes(Garage_Qual, Sale_Price, color = Garage_Finish)) +
geom_boxplot()+
scale_y_continuous(labels = scales::label_number_si())
Além das variáveis altamente correlacioanadas, há uma concentração do tipo de garagem em difenrentes níveis de preço. O tipo de acabamento, no entanto, parece não ter nenhuma relação significativa.
dados %>%
ggplot(aes(Gr_Liv_Area, Sale_Price, size = Fireplaces, color = Sale_Condition)) +
geom_point(alpha = 0.5)+
scale_y_continuous(labels = scales::label_number_si())
Aqui apenas é confirmada a correlação da quantidade de lareiras e área comum. A condição de venda “Normal” predomina na amostra.
dados %>%
ggplot(aes(Overall_Qual, Sale_Price, fill = Street)) +
geom_boxplot()+
scale_y_continuous(labels = scales::label_number_si())
dados %>%
ggplot(aes(Overall_Cond, Sale_Price, fill = Street)) +
geom_boxplot()+
scale_y_continuous(labels = scales::label_number_si())
Apesar de estar na base como uma variável categórica, “Overall Quality” parace ter uma correlação alta com Sale_Price.
Antes de alimentar os dados aos modelos, é necessário separar a base em treino e teste, e garantir que está em um formato interpretável para os algoritmos utilizados. O parâmetro “strata” divide a base proporcionalmente conforme a variável indicada.
set.seed(123)
(ames_split <- initial_split(dados, prop = 0.8, strata = 'Sale_Price'))
## <2345/584/2929>
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
A interface do tidymodels permite a criação de uma receita com o pacote “recipes” para processar os dados antes do modelo. Isso facilita também o pré-processamento da base de teste ao final do relatório, pois aplica exatamente as mesmas modificações. Algumas variáveis já estão na base como fator, portanto não é necessário ordená-las. No entanto, é necessário incluir na receita um passo para converter a ordem em números: step_ordinalscore
A seguir as variáveis ordenadas:
ord_vars <- vapply(dados, is.ordered, logical(1))
(ordered <-names(ord_vars)[ord_vars])
## [1] "Lot_Shape" "Land_Contour" "Utilities" "Land_Slope"
## [5] "Overall_Qual" "Overall_Cond" "Exter_Qual" "Exter_Cond"
## [9] "Bsmt_Qual" "Bsmt_Cond" "Bsmt_Exposure" "BsmtFin_Type_1"
## [13] "BsmtFin_Type_2" "Heating_QC" "Electrical" "Kitchen_Qual"
## [17] "Functional" "Fireplace_Qu" "Garage_Finish" "Garage_Qual"
## [21] "Garage_Cond" "Paved_Drive" "Pool_QC" "Fence"
Como algumas colunas possuem muitas categorias, as que possuem mais de 10 serão reduzidas com “step_other” (10 é o número de classificações atribuídas nas colunas que avaliam a qualidade). Após isso, cria-se variáveis dummy para as demais categorias (step_dummy), remove-se as variáveis com apenas um valor (step_vz) e normaliza-se a base (step_normalize).
(ames_rec <- recipe(Sale_Price ~ ., data = ames_train) %>%
#step_log(Sale_Price)) %>%
step_other(MS_SubClass, Neighborhood, Exterior_1st, Exterior_2nd, threshold = 0.02) %>%
step_ordinalscore(ordered) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric()) %>%
prep())
## Data Recipe
##
## Inputs:
##
## role #variables
## outcome 1
## predictor 80
##
## Training data contained 2345 data points and no missing data.
##
## Operations:
##
## Collapsing factor levels for MS_SubClass, Neighborhood, ... [trained]
## Scoring for Lot_Shape, Land_Contour, Utilities, ... [trained]
## Dummy variables from MS_SubClass, MS_Zoning, Street, Alley, ... [trained]
## Zero variance filter removed Roof_Matl_Membran [trained]
## Centering and scaling for Lot_Frontage, Lot_Area, Lot_Shape, ... [trained]
A receita pronta é então aplicada nas bases de treino e teste com a função “bake”. Para a base teste, pode-se utilizar juice, que é um caso especial de bake.
train_baked <- juice(ames_rec)
test_baked <- bake(ames_rec, new_data = ames_test)
A seguir uma ideia de como ficou a base processada:
glimpse(train_baked)
## Observations: 2,345
## Variables: 190
## $ Lot_Frontage <dbl> 2.5168358, 0.66…
## $ Lot_Area <dbl> 2.71565361, 0.1…
## $ Lot_Shape <dbl> -1.0682387, 0.7…
## $ Land_Contour <dbl> 0.312283, 0.312…
## $ Utilities <dbl> 0.02921031, 0.0…
## $ Land_Slope <dbl> 0.2173651, 0.21…
## $ Overall_Qual <dbl> -0.05944727, -0…
## $ Overall_Cond <dbl> -0.5150557, 0.4…
## $ Year_Built <dbl> -0.364886417, -…
## $ Year_Remod_Add <dbl> -1.16087360, -1…
## $ Mas_Vnr_Area <dbl> 0.06998395, -0.…
## $ Exter_Qual <dbl> -0.6789786, -0.…
## $ Exter_Cond <dbl> -0.2283709, -0.…
## $ Bsmt_Qual <dbl> -0.5325834, -0.…
## $ Bsmt_Cond <dbl> 1.8798912, 0.12…
## $ Bsmt_Exposure <dbl> 2.2266461, -0.5…
## $ BsmtFin_Type_1 <dbl> 0.2033097, -0.2…
## $ BsmtFin_SF_1 <dbl> -0.96665357, 0.…
## $ BsmtFin_Type_2 <dbl> -0.2962550, 0.7…
## $ BsmtFin_SF_2 <dbl> -0.2990407, 0.5…
## $ Bsmt_Unf_SF <dbl> -0.26299552, -0…
## $ Total_Bsmt_SF <dbl> 0.07029330, -0.…
## $ Heating_QC <dbl> -2.2481305, -1.…
## $ Electrical <dbl> 0.2785576, 0.27…
## $ First_Flr_SF <dbl> 1.29787851, -0.…
## $ Second_Flr_SF <dbl> -0.7923782, -0.…
## $ Low_Qual_Fin_SF <dbl> -0.0990501, -0.…
## $ Gr_Liv_Area <dbl> 0.315621947, -1…
## $ Bsmt_Full_Bath <dbl> 1.0782317, -0.8…
## $ Bsmt_Half_Bath <dbl> -0.2484801, -0.…
## $ Full_Bath <dbl> -1.0107668, -1.…
## $ Half_Bath <dbl> -0.7615697, -0.…
## $ Bedroom_AbvGr <dbl> 0.1789204, -1.0…
## $ Kitchen_AbvGr <dbl> -0.2049649, -0.…
## $ Kitchen_Qual <dbl> -0.7564502, -0.…
## $ TotRms_AbvGrd <dbl> 0.3492080, -0.9…
## $ Functional <dbl> 0.2361348, 0.23…
## $ Fireplaces <dbl> 2.1468643, -0.9…
## $ Fireplace_Qu <dbl> 1.2199533, -0.9…
## $ Garage_Finish <dbl> 1.4351461, -0.7…
## $ Garage_Cars <dbl> 0.3081701, -0.9…
## $ Garage_Area <dbl> 0.267762168, 1.…
## $ Garage_Qual <dbl> 0.2838117, 0.28…
## $ Garage_Cond <dbl> 0.2715513, 0.27…
## $ Paved_Drive <dbl> -1.594015, 0.30…
## $ Wood_Deck_SF <dbl> 0.9448053, 0.38…
## $ Open_Porch_SF <dbl> 0.20929475, -0.…
## $ Enclosed_Porch <dbl> -0.3783267, -0.…
## $ Three_season_porch <dbl> -0.1048607, -0.…
## $ Screen_Porch <dbl> -0.2875418, 1.8…
## $ Pool_Area <dbl> -0.06434762, -0…
## $ Pool_QC <dbl> -0.06587967, -0…
## $ Fence <dbl> -0.481867, 1.94…
## $ Misc_Val <dbl> -0.09131429, -0…
## $ Mo_Sold <dbl> -0.45149090, -0…
## $ Year_Sold <dbl> 1.689468, 1.689…
## $ Longitude <dbl> 0.9072780, 0.90…
## $ Latitude <dbl> 1.0702225, 1.01…
## $ Sale_Price <dbl> 0.423341430, -0…
## $ MS_SubClass_One_Story_1945_and_Older <dbl> -0.2196778, -0.…
## $ MS_SubClass_One_and_Half_Story_Finished_All_Ages <dbl> -0.3360188, -0.…
## $ MS_SubClass_Two_Story_1946_and_Newer <dbl> -0.4972273, -0.…
## $ MS_SubClass_Two_Story_1945_and_Older <dbl> -0.2186096, -0.…
## $ MS_SubClass_Split_or_Multilevel <dbl> -0.1962479, -0.…
## $ MS_SubClass_Duplex_All_Styles_and_Ages <dbl> -0.1927067, -0.…
## $ MS_SubClass_One_Story_PUD_1946_and_Newer <dbl> -0.2668989, -0.…
## $ MS_SubClass_Two_Story_PUD_1946_and_Newer <dbl> -0.2142931, -0.…
## $ MS_SubClass_Two_Family_conversion_All_Styles_and_Ages <dbl> -0.146056, -0.1…
## $ MS_SubClass_other <dbl> -0.2020322, -0.…
## $ MS_Zoning_Residential_High_Density <dbl> -0.09729581, 10…
## $ MS_Zoning_Residential_Low_Density <dbl> 0.5409237, -1.8…
## $ MS_Zoning_Residential_Medium_Density <dbl> -0.4334318, -0.…
## $ MS_Zoning_A_agr <dbl> -0.02921031, -0…
## $ MS_Zoning_C_all <dbl> -0.09729581, -0…
## $ MS_Zoning_I_all <dbl> -0.02921031, -0…
## $ Street_Pave <dbl> 0.06542804, 0.0…
## $ Alley_No_Alley_Access <dbl> 0.2741583, 0.27…
## $ Alley_Paved <dbl> -0.1633895, -0.…
## $ Lot_Config_CulDSac <dbl> -0.2566729, -0.…
## $ Lot_Config_FR2 <dbl> -0.171462, -0.1…
## $ Lot_Config_FR3 <dbl> -0.06542804, -0…
## $ Lot_Config_Inside <dbl> -1.6477758, 0.6…
## $ Neighborhood_College_Creek <dbl> -0.3168273, -0.…
## $ Neighborhood_Old_Town <dbl> -0.2968661, -0.…
## $ Neighborhood_Edwards <dbl> -0.2696379, -0.…
## $ Neighborhood_Somerset <dbl> -0.261358, -0.2…
## $ Neighborhood_Northridge_Heights <dbl> -0.2431948, -0.…
## $ Neighborhood_Gilbert <dbl> -0.2392371, -0.…
## $ Neighborhood_Sawyer <dbl> -0.2280769, -0.…
## $ Neighborhood_Northwest_Ames <dbl> -0.2186096, -0.…
## $ Neighborhood_Sawyer_West <dbl> -0.2175371, -0.…
## $ Neighborhood_Mitchell <dbl> -0.1927067, -0.…
## $ Neighborhood_Brookside <dbl> -0.2043071, -0.…
## $ Neighborhood_Crawford <dbl> -0.1915138, -0.…
## $ Neighborhood_Iowa_DOT_and_Rail_Road <dbl> -0.180474, -0.1…
## $ Neighborhood_Timberland <dbl> -0.156379, -0.1…
## $ Neighborhood_Northridge <dbl> -0.1549427, -0.…
## $ Neighborhood_other <dbl> -0.3445969, -0.…
## $ Condition_1_Feedr <dbl> -0.2480727, 4.0…
## $ Condition_1_Norm <dbl> 0.4010268, -2.4…
## $ Condition_1_PosA <dbl> -0.08021859, -0…
## $ Condition_1_PosN <dbl> -0.1138132, -0.…
## $ Condition_1_RRAe <dbl> -0.09036065, -0…
## $ Condition_1_RRAn <dbl> -0.1350159, -0.…
## $ Condition_1_RRNe <dbl> -0.04621516, -0…
## $ Condition_1_RRNn <dbl> -0.05063699, -0…
## $ Condition_2_Feedr <dbl> -0.06542804, -0…
## $ Condition_2_Norm <dbl> 0.092728, 0.092…
## $ Condition_2_PosA <dbl> -0.03578282, -0…
## $ Condition_2_PosN <dbl> -0.02921031, -0…
## $ Condition_2_RRAe <dbl> -0.02065041, -0…
## $ Condition_2_RRAn <dbl> -0.02065041, -0…
## $ Condition_2_RRNn <dbl> -0.02065041, -0…
## $ Bldg_Type_TwoFmCon <dbl> -0.1490719, -0.…
## $ Bldg_Type_Duplex <dbl> -0.1927067, -0.…
## $ Bldg_Type_Twnhs <dbl> -0.1891088, -0.…
## $ Bldg_Type_TwnhsE <dbl> -0.2968661, -0.…
## $ House_Style_One_and_Half_Unf <dbl> -0.08543589, -0…
## $ House_Style_One_Story <dbl> 0.9985085, 0.99…
## $ House_Style_SFoyer <dbl> -0.171462, -0.1…
## $ House_Style_SLvl <dbl> -0.2043071, -0.…
## $ House_Style_Two_and_Half_Fin <dbl> -0.05063699, -0…
## $ House_Style_Two_and_Half_Unf <dbl> -0.092728, -0.0…
## $ House_Style_Two_Story <dbl> -0.6548464, -0.…
## $ Roof_Style_Gable <dbl> -1.9528359, 0.5…
## $ Roof_Style_Gambrel <dbl> -0.08543589, -0…
## $ Roof_Style_Hip <dbl> 2.0832269, -0.4…
## $ Roof_Style_Mansard <dbl> -0.06205721, -0…
## $ Roof_Style_Shed <dbl> -0.04621516, -0…
## $ Roof_Matl_CompShg <dbl> 0.1138132, 0.11…
## $ Roof_Matl_Metal <dbl> -0.02065041, -0…
## $ Roof_Matl_Roll <dbl> -0.02065041, -0…
## $ Roof_Matl_Tar.Grv <dbl> -0.092728, -0.0…
## $ Roof_Matl_WdShake <dbl> -0.04132727, -0…
## $ Roof_Matl_WdShngl <dbl> -0.04132727, -0…
## $ Exterior_1st_CemntBd <dbl> -0.2164602, -0.…
## $ Exterior_1st_HdBoard <dbl> -0.4088717, -0.…
## $ Exterior_1st_MetalSd <dbl> -0.4278671, -0.…
## $ Exterior_1st_Plywood <dbl> -0.2874118, -0.…
## $ Exterior_1st_VinylSd <dbl> -0.7365675, 1.3…
## $ Exterior_1st_Wd.Sdng <dbl> -0.4124162, -0.…
## $ Exterior_1st_WdShing <dbl> -0.1445265, -0.…
## $ Exterior_1st_other <dbl> -0.1903146, -0.…
## $ Exterior_2nd_HdBoard <dbl> -0.3858537, -0.…
## $ Exterior_2nd_MetalSd <dbl> -0.4271696, -0.…
## $ Exterior_2nd_Plywood <dbl> 3.0994776, -0.3…
## $ Exterior_2nd_VinylSd <dbl> -0.7310662, 1.3…
## $ Exterior_2nd_Wd.Sdng <dbl> -0.4010268, -0.…
## $ Exterior_2nd_Wd.Shng <dbl> -0.1701403, -0.…
## $ Exterior_2nd_other <dbl> -0.2566729, -0.…
## $ Mas_Vnr_Type_BrkFace <dbl> -0.6442291, -0.…
## $ Mas_Vnr_Type_CBlock <dbl> -0.02065041, -0…
## $ Mas_Vnr_Type_None <dbl> -1.2521129, 0.7…
## $ Mas_Vnr_Type_Stone <dbl> 3.2476491, -0.3…
## $ Foundation_CBlock <dbl> 1.166598, 1.166…
## $ Foundation_PConc <dbl> -0.9002583, -0.…
## $ Foundation_Slab <dbl> -0.1230652, -0.…
## $ Foundation_Stone <dbl> -0.0547059, -0.…
## $ Foundation_Wood <dbl> -0.04132727, -0…
## $ Heating_GasA <dbl> 0.13002, 0.1300…
## $ Heating_GasW <dbl> -0.1037847, -0.…
## $ Heating_Grav <dbl> -0.0547059, -0.…
## $ Heating_OthW <dbl> -0.02921031, -0…
## $ Heating_Wall <dbl> -0.04132727, -0…
## $ Central_Air_Y <dbl> 0.2759515, 0.27…
## $ Garage_Type_Basment <dbl> -0.111876, -0.1…
## $ Garage_Type_BuiltIn <dbl> -0.2641392, -0.…
## $ Garage_Type_CarPort <dbl> -0.06863621, -0…
## $ Garage_Type_Detchd <dbl> -0.5993889, -0.…
## $ Garage_Type_More_Than_Two_Types <dbl> -0.092728, -0.0…
## $ Garage_Type_No_Garage <dbl> -0.2412222, -0.…
## $ Misc_Feature_Gar2 <dbl> -0.04621516, -0…
## $ Misc_Feature_None <dbl> 0.1962479, 0.19…
## $ Misc_Feature_Othr <dbl> -0.03578282, -0…
## $ Misc_Feature_Shed <dbl> -0.1842176, -0.…
## $ Misc_Feature_TenC <dbl> -0.02065041, -0…
## $ Sale_Type_Con <dbl> -0.04132727, -0…
## $ Sale_Type_ConLD <dbl> -0.09729581, -0…
## $ Sale_Type_ConLI <dbl> -0.0547059, -0.…
## $ Sale_Type_ConLw <dbl> -0.0547059, -0.…
## $ Sale_Type_CWD <dbl> -0.07170355, -0…
## $ Sale_Type_New <dbl> -0.298563, -0.2…
## $ Sale_Type_Oth <dbl> -0.05063699, -0…
## $ Sale_Type_VWD <dbl> -0.02065041, -0…
## $ Sale_Type_WD. <dbl> 0.3981573, 0.39…
## $ Sale_Condition_AdjLand <dbl> -0.07170355, -0…
## $ Sale_Condition_Alloca <dbl> -0.08543589, -0…
## $ Sale_Condition_Family <dbl> -0.130020, -0.1…
## $ Sale_Condition_Normal <dbl> 0.4683537, 0.46…
## $ Sale_Condition_Partial <dbl> -0.3027773, -0.…
Aplica-se então cross validation na base de treino. Além disso, serão criadas amostras bootstrap para ajuste dos hiperparâmetros de alguns dos modelos utilizados.
set.seed(123)
(cv_splits <- vfold_cv(train_baked, v = 5, strata = Sale_Price))
## # 5-fold cross-validation using stratification
## # A tibble: 5 x 2
## splits id
## <named list> <chr>
## 1 <split [1.9K/470]> Fold1
## 2 <split [1.9K/470]> Fold2
## 3 <split [1.9K/469]> Fold3
## 4 <split [1.9K/468]> Fold4
## 5 <split [1.9K/468]> Fold5
set.seed(123)
(ames_boot <- bootstraps(train_baked, times = 10, strata = Sale_Price))
## # Bootstrap sampling using stratification
## # A tibble: 10 x 2
## splits id
## <named list> <chr>
## 1 <split [2.3K/865]> Bootstrap01
## 2 <split [2.3K/857]> Bootstrap02
## 3 <split [2.3K/890]> Bootstrap03
## 4 <split [2.3K/858]> Bootstrap04
## 5 <split [2.3K/872]> Bootstrap05
## 6 <split [2.3K/864]> Bootstrap06
## 7 <split [2.3K/855]> Bootstrap07
## 8 <split [2.3K/856]> Bootstrap08
## 9 <split [2.3K/844]> Bootstrap09
## 10 <split [2.3K/876]> Bootstrap10
Antes de ajustar os modelos, é necessário especificar o pacote usado, modo e parâmetros. Depois disso, é feita a modelagem nos diferentes folds criados por cross validation para avaliar seu desempenho.
lm_spec <- linear_reg() %>%
set_engine('lm')
#fit nos cv folds
lm_res <- fit_resamples(Sale_Price ~ .,
lm_spec,
cv_splits,
control = control_resamples(save_pred = TRUE))
Resumo do modelo ajustado:
(lm_res %>%
collect_metrics())
## # A tibble: 2 x 5
## .metric .estimator mean n std_err
## <chr> <chr> <dbl> <int> <dbl>
## 1 rmse standard 0.335 5 0.0336
## 2 rsq standard 0.887 5 0.0197
Modelo ajustado na base completa, resumo e importância das variáveis:
lm_fit <- lm_spec %>%
fit(Sale_Price ~.,
data = train_baked)
lm_fit %>%
summary()
## Length Class Mode
## lvl 0 -none- NULL
## spec 5 linear_reg list
## fit 12 lm list
## preproc 1 -none- list
## elapsed 5 proc_time numeric
lm_fit %>%
tidy()
## # A tibble: 187 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1.28e-14 0.00580 -2.21e-12 1.00e+ 0
## 2 Lot_Frontage 1.95e- 2 0.00720 2.71e+ 0 6.74e- 3
## 3 Lot_Area 4.44e- 2 0.00776 5.72e+ 0 1.20e- 8
## 4 Lot_Shape -5.80e- 3 0.00738 -7.86e- 1 4.32e- 1
## 5 Land_Contour -8.18e- 3 0.00803 -1.02e+ 0 3.08e- 1
## 6 Utilities 1.75e- 3 0.00712 2.47e- 1 8.05e- 1
## 7 Land_Slope 2.35e- 3 0.00829 2.84e- 1 7.77e- 1
## 8 Overall_Qual 1.48e- 1 0.0128 1.15e+ 1 7.32e-30
## 9 Overall_Cond 7.26e- 2 0.00852 8.52e+ 0 3.07e-17
## 10 Year_Built 1.54e- 1 0.0221 6.98e+ 0 4.05e-12
## # … with 177 more rows
O pacote tidymodels não possui uma integração pronta com o pacote leaps para realizar stepwise selection. Devido a isso, o procedimento será feito, mas o modelo não será considerado na avaliação final. É possível criar essa integração pelo pacote parsnip.
library(leaps)
stepb <- regsubsets(Sale_Price ~ ., data = train_baked, nvmax = 10,
method = "backward")
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 3 linear dependencies found
## Reordering variables and trying again:
resumo_stepb <- summary(stepb)
resumo_stepb$outmat
## Lot_Frontage Lot_Area Lot_Shape Land_Contour Utilities Land_Slope
## 1 ( 1 ) " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " " " "
## Overall_Qual Overall_Cond Year_Built Year_Remod_Add Mas_Vnr_Area
## 1 ( 1 ) "*" " " " " " " " "
## 2 ( 1 ) "*" " " " " " " " "
## 3 ( 1 ) "*" " " " " " " " "
## 4 ( 1 ) "*" " " " " " " " "
## 5 ( 1 ) "*" " " " " " " " "
## 6 ( 1 ) "*" " " " " " " " "
## 7 ( 1 ) "*" " " " " " " " "
## 8 ( 1 ) "*" " " " " " " " "
## 9 ( 1 ) "*" " " " " " " " "
## 10 ( 1 ) "*" " " " " " " " "
## 11 ( 1 ) "*" " " " " " " " "
## Exter_Qual Exter_Cond Bsmt_Qual Bsmt_Cond Bsmt_Exposure
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) "*" " " " " " " " "
## 5 ( 1 ) "*" " " " " " " " "
## 6 ( 1 ) "*" " " " " " " " "
## 7 ( 1 ) "*" " " " " " " " "
## 8 ( 1 ) "*" " " " " " " " "
## 9 ( 1 ) "*" " " " " " " " "
## 10 ( 1 ) "*" " " " " " " " "
## 11 ( 1 ) "*" " " " " " " " "
## BsmtFin_Type_1 BsmtFin_SF_1 BsmtFin_Type_2 BsmtFin_SF_2 Bsmt_Unf_SF
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " "*"
## 6 ( 1 ) " " " " " " " " "*"
## 7 ( 1 ) " " " " " " " " "*"
## 8 ( 1 ) " " " " " " " " "*"
## 9 ( 1 ) " " " " " " " " "*"
## 10 ( 1 ) " " " " " " " " "*"
## 11 ( 1 ) " " " " " " " " "*"
## Total_Bsmt_SF Heating_QC Electrical First_Flr_SF Second_Flr_SF
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " "*" " "
## 3 ( 1 ) " " " " " " "*" "*"
## 4 ( 1 ) " " " " " " "*" "*"
## 5 ( 1 ) " " " " " " "*" "*"
## 6 ( 1 ) "*" " " " " "*" "*"
## 7 ( 1 ) "*" " " " " "*" "*"
## 8 ( 1 ) "*" " " " " "*" "*"
## 9 ( 1 ) "*" " " " " "*" "*"
## 10 ( 1 ) "*" " " " " "*" "*"
## 11 ( 1 ) "*" " " " " "*" "*"
## Low_Qual_Fin_SF Gr_Liv_Area Bsmt_Full_Bath Bsmt_Half_Bath Full_Bath
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Half_Bath Bedroom_AbvGr Kitchen_AbvGr Kitchen_Qual TotRms_AbvGrd
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Functional Fireplaces Fireplace_Qu Garage_Finish Garage_Cars
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Garage_Area Garage_Qual Garage_Cond Paved_Drive Wood_Deck_SF
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Open_Porch_SF Enclosed_Porch Three_season_porch Screen_Porch
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Pool_Area Pool_QC Fence Misc_Val Mo_Sold Year_Sold Longitude Latitude
## 1 ( 1 ) " " " " " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " " " " " " " "
## MS_SubClass_One_Story_1945_and_Older
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_One_and_Half_Story_Finished_All_Ages
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Story_1946_and_Newer
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Story_1945_and_Older MS_SubClass_Split_or_Multilevel
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## MS_SubClass_Duplex_All_Styles_and_Ages
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_One_Story_PUD_1946_and_Newer
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Story_PUD_1946_and_Newer
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Family_conversion_All_Styles_and_Ages
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_other MS_Zoning_Residential_High_Density
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## MS_Zoning_Residential_Low_Density
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_Zoning_Residential_Medium_Density MS_Zoning_A_agr MS_Zoning_C_all
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## MS_Zoning_I_all Street_Pave Alley_No_Alley_Access Alley_Paved
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Lot_Config_CulDSac Lot_Config_FR2 Lot_Config_FR3 Lot_Config_Inside
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Neighborhood_College_Creek Neighborhood_Old_Town Neighborhood_Edwards
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Neighborhood_Somerset Neighborhood_Northridge_Heights
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Neighborhood_Gilbert Neighborhood_Sawyer Neighborhood_Northwest_Ames
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Neighborhood_Sawyer_West Neighborhood_Mitchell Neighborhood_Brookside
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Neighborhood_Crawford Neighborhood_Iowa_DOT_and_Rail_Road
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Neighborhood_Timberland Neighborhood_Northridge Neighborhood_other
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Condition_1_Feedr Condition_1_Norm Condition_1_PosA Condition_1_PosN
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Condition_1_RRAe Condition_1_RRAn Condition_1_RRNe Condition_1_RRNn
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Condition_2_Feedr Condition_2_Norm Condition_2_PosA Condition_2_PosN
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Condition_2_RRAe Condition_2_RRAn Condition_2_RRNn Bldg_Type_TwoFmCon
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Bldg_Type_Duplex Bldg_Type_Twnhs Bldg_Type_TwnhsE
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## House_Style_One_and_Half_Unf House_Style_One_Story House_Style_SFoyer
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## House_Style_SLvl House_Style_Two_and_Half_Fin
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## House_Style_Two_and_Half_Unf House_Style_Two_Story Roof_Style_Gable
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Roof_Style_Gambrel Roof_Style_Hip Roof_Style_Mansard Roof_Style_Shed
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Roof_Matl_CompShg Roof_Matl_Metal Roof_Matl_Roll Roof_Matl_Tar.Grv
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Roof_Matl_WdShake Roof_Matl_WdShngl Exterior_1st_CemntBd
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_1st_HdBoard Exterior_1st_MetalSd Exterior_1st_Plywood
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_1st_VinylSd Exterior_1st_Wd.Sdng Exterior_1st_WdShing
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_1st_other Exterior_2nd_HdBoard Exterior_2nd_MetalSd
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_2nd_Plywood Exterior_2nd_VinylSd Exterior_2nd_Wd.Sdng
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_2nd_Wd.Shng Exterior_2nd_other Mas_Vnr_Type_BrkFace
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Mas_Vnr_Type_CBlock Mas_Vnr_Type_None Mas_Vnr_Type_Stone
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Foundation_CBlock Foundation_PConc Foundation_Slab Foundation_Stone
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Foundation_Wood Heating_GasA Heating_GasW Heating_Grav Heating_OthW
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Heating_Wall Central_Air_Y Garage_Type_Basment Garage_Type_BuiltIn
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Garage_Type_CarPort Garage_Type_Detchd
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Garage_Type_More_Than_Two_Types Garage_Type_No_Garage
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Misc_Feature_Gar2 Misc_Feature_None Misc_Feature_Othr
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " "*" " "
## 9 ( 1 ) " " "*" " "
## 10 ( 1 ) "*" "*" " "
## 11 ( 1 ) "*" "*" "*"
## Misc_Feature_Shed Misc_Feature_TenC Sale_Type_Con Sale_Type_ConLD
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) "*" " " " " " "
## 10 ( 1 ) "*" " " " " " "
## 11 ( 1 ) "*" " " " " " "
## Sale_Type_ConLI Sale_Type_ConLw Sale_Type_CWD Sale_Type_New
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " "*"
## 8 ( 1 ) " " " " " " "*"
## 9 ( 1 ) " " " " " " "*"
## 10 ( 1 ) " " " " " " "*"
## 11 ( 1 ) " " " " " " "*"
## Sale_Type_Oth Sale_Type_VWD Sale_Type_WD. Sale_Condition_AdjLand
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Sale_Condition_Alloca Sale_Condition_Family Sale_Condition_Normal
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Sale_Condition_Partial
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
coef(stepb, id = which.min(resumo_stepb$bic))
## (Intercept) Overall_Qual Exter_Qual Bsmt_Unf_SF
## -2.632578e-17 3.166017e-01 2.124920e-01 -1.651529e-01
## Total_Bsmt_SF First_Flr_SF Second_Flr_SF Misc_Feature_Shed
## 2.491390e-01 3.179770e-01 3.014907e-01 -3.175356e-03
## Misc_Feature_TenC Sale_Type_Con Sale_Type_ConLD Sale_Type_WD.
## -9.898401e-03 3.379658e-03 -1.065269e-02 -5.593790e-02
plot(stepb, scale = "adjr2")
summary(stepb)
## Subset selection object
## Call: regsubsets.formula(Sale_Price ~ ., data = train_baked, nvmax = 10,
## method = "backward")
## 189 Variables (and intercept)
## Forced in Forced out
## Lot_Frontage FALSE FALSE
## Lot_Area FALSE FALSE
## Lot_Shape FALSE FALSE
## Land_Contour FALSE FALSE
## Utilities FALSE FALSE
## Land_Slope FALSE FALSE
## Overall_Qual FALSE FALSE
## Overall_Cond FALSE FALSE
## Year_Built FALSE FALSE
## Year_Remod_Add FALSE FALSE
## Mas_Vnr_Area FALSE FALSE
## Exter_Qual FALSE FALSE
## Exter_Cond FALSE FALSE
## Bsmt_Qual FALSE FALSE
## Bsmt_Cond FALSE FALSE
## Bsmt_Exposure FALSE FALSE
## BsmtFin_Type_1 FALSE FALSE
## BsmtFin_SF_1 FALSE FALSE
## BsmtFin_Type_2 FALSE FALSE
## BsmtFin_SF_2 FALSE FALSE
## Bsmt_Unf_SF FALSE FALSE
## Total_Bsmt_SF FALSE FALSE
## Heating_QC FALSE FALSE
## Electrical FALSE FALSE
## First_Flr_SF FALSE FALSE
## Second_Flr_SF FALSE FALSE
## Low_Qual_Fin_SF FALSE FALSE
## Bsmt_Full_Bath FALSE FALSE
## Bsmt_Half_Bath FALSE FALSE
## Full_Bath FALSE FALSE
## Half_Bath FALSE FALSE
## Bedroom_AbvGr FALSE FALSE
## Kitchen_AbvGr FALSE FALSE
## Kitchen_Qual FALSE FALSE
## TotRms_AbvGrd FALSE FALSE
## Functional FALSE FALSE
## Fireplaces FALSE FALSE
## Fireplace_Qu FALSE FALSE
## Garage_Finish FALSE FALSE
## Garage_Cars FALSE FALSE
## Garage_Area FALSE FALSE
## Garage_Qual FALSE FALSE
## Garage_Cond FALSE FALSE
## Paved_Drive FALSE FALSE
## Wood_Deck_SF FALSE FALSE
## Open_Porch_SF FALSE FALSE
## Enclosed_Porch FALSE FALSE
## Three_season_porch FALSE FALSE
## Screen_Porch FALSE FALSE
## Pool_Area FALSE FALSE
## Pool_QC FALSE FALSE
## Fence FALSE FALSE
## Misc_Val FALSE FALSE
## Mo_Sold FALSE FALSE
## Year_Sold FALSE FALSE
## Longitude FALSE FALSE
## Latitude FALSE FALSE
## MS_SubClass_One_Story_1945_and_Older FALSE FALSE
## MS_SubClass_One_and_Half_Story_Finished_All_Ages FALSE FALSE
## MS_SubClass_Two_Story_1946_and_Newer FALSE FALSE
## MS_SubClass_Two_Story_1945_and_Older FALSE FALSE
## MS_SubClass_Split_or_Multilevel FALSE FALSE
## MS_SubClass_Duplex_All_Styles_and_Ages FALSE FALSE
## MS_SubClass_One_Story_PUD_1946_and_Newer FALSE FALSE
## MS_SubClass_Two_Story_PUD_1946_and_Newer FALSE FALSE
## MS_SubClass_Two_Family_conversion_All_Styles_and_Ages FALSE FALSE
## MS_SubClass_other FALSE FALSE
## MS_Zoning_Residential_High_Density FALSE FALSE
## MS_Zoning_Residential_Low_Density FALSE FALSE
## MS_Zoning_Residential_Medium_Density FALSE FALSE
## MS_Zoning_A_agr FALSE FALSE
## MS_Zoning_C_all FALSE FALSE
## MS_Zoning_I_all FALSE FALSE
## Street_Pave FALSE FALSE
## Alley_No_Alley_Access FALSE FALSE
## Alley_Paved FALSE FALSE
## Lot_Config_CulDSac FALSE FALSE
## Lot_Config_FR2 FALSE FALSE
## Lot_Config_FR3 FALSE FALSE
## Lot_Config_Inside FALSE FALSE
## Neighborhood_College_Creek FALSE FALSE
## Neighborhood_Old_Town FALSE FALSE
## Neighborhood_Edwards FALSE FALSE
## Neighborhood_Somerset FALSE FALSE
## Neighborhood_Northridge_Heights FALSE FALSE
## Neighborhood_Gilbert FALSE FALSE
## Neighborhood_Sawyer FALSE FALSE
## Neighborhood_Northwest_Ames FALSE FALSE
## Neighborhood_Sawyer_West FALSE FALSE
## Neighborhood_Mitchell FALSE FALSE
## Neighborhood_Brookside FALSE FALSE
## Neighborhood_Crawford FALSE FALSE
## Neighborhood_Iowa_DOT_and_Rail_Road FALSE FALSE
## Neighborhood_Timberland FALSE FALSE
## Neighborhood_Northridge FALSE FALSE
## Neighborhood_other FALSE FALSE
## Condition_1_Feedr FALSE FALSE
## Condition_1_Norm FALSE FALSE
## Condition_1_PosA FALSE FALSE
## Condition_1_PosN FALSE FALSE
## Condition_1_RRAe FALSE FALSE
## Condition_1_RRAn FALSE FALSE
## Condition_1_RRNe FALSE FALSE
## Condition_1_RRNn FALSE FALSE
## Condition_2_Feedr FALSE FALSE
## Condition_2_Norm FALSE FALSE
## Condition_2_PosA FALSE FALSE
## Condition_2_PosN FALSE FALSE
## Condition_2_RRAe FALSE FALSE
## Condition_2_RRAn FALSE FALSE
## Condition_2_RRNn FALSE FALSE
## Bldg_Type_TwoFmCon FALSE FALSE
## Bldg_Type_Twnhs FALSE FALSE
## Bldg_Type_TwnhsE FALSE FALSE
## House_Style_One_and_Half_Unf FALSE FALSE
## House_Style_One_Story FALSE FALSE
## House_Style_SFoyer FALSE FALSE
## House_Style_SLvl FALSE FALSE
## House_Style_Two_and_Half_Fin FALSE FALSE
## House_Style_Two_and_Half_Unf FALSE FALSE
## House_Style_Two_Story FALSE FALSE
## Roof_Style_Gable FALSE FALSE
## Roof_Style_Gambrel FALSE FALSE
## Roof_Style_Hip FALSE FALSE
## Roof_Style_Mansard FALSE FALSE
## Roof_Style_Shed FALSE FALSE
## Roof_Matl_CompShg FALSE FALSE
## Roof_Matl_Metal FALSE FALSE
## Roof_Matl_Roll FALSE FALSE
## Roof_Matl_Tar.Grv FALSE FALSE
## Roof_Matl_WdShake FALSE FALSE
## Exterior_1st_CemntBd FALSE FALSE
## Exterior_1st_HdBoard FALSE FALSE
## Exterior_1st_MetalSd FALSE FALSE
## Exterior_1st_Plywood FALSE FALSE
## Exterior_1st_VinylSd FALSE FALSE
## Exterior_1st_Wd.Sdng FALSE FALSE
## Exterior_1st_WdShing FALSE FALSE
## Exterior_1st_other FALSE FALSE
## Exterior_2nd_HdBoard FALSE FALSE
## Exterior_2nd_MetalSd FALSE FALSE
## Exterior_2nd_Plywood FALSE FALSE
## Exterior_2nd_VinylSd FALSE FALSE
## Exterior_2nd_Wd.Sdng FALSE FALSE
## Exterior_2nd_Wd.Shng FALSE FALSE
## Exterior_2nd_other FALSE FALSE
## Mas_Vnr_Type_BrkFace FALSE FALSE
## Mas_Vnr_Type_CBlock FALSE FALSE
## Mas_Vnr_Type_None FALSE FALSE
## Mas_Vnr_Type_Stone FALSE FALSE
## Foundation_CBlock FALSE FALSE
## Foundation_PConc FALSE FALSE
## Foundation_Slab FALSE FALSE
## Foundation_Stone FALSE FALSE
## Foundation_Wood FALSE FALSE
## Heating_GasA FALSE FALSE
## Heating_GasW FALSE FALSE
## Heating_Grav FALSE FALSE
## Heating_OthW FALSE FALSE
## Heating_Wall FALSE FALSE
## Central_Air_Y FALSE FALSE
## Garage_Type_Basment FALSE FALSE
## Garage_Type_BuiltIn FALSE FALSE
## Garage_Type_CarPort FALSE FALSE
## Garage_Type_Detchd FALSE FALSE
## Garage_Type_More_Than_Two_Types FALSE FALSE
## Garage_Type_No_Garage FALSE FALSE
## Misc_Feature_Gar2 FALSE FALSE
## Misc_Feature_None FALSE FALSE
## Misc_Feature_Othr FALSE FALSE
## Misc_Feature_Shed FALSE FALSE
## Misc_Feature_TenC FALSE FALSE
## Sale_Type_Con FALSE FALSE
## Sale_Type_ConLD FALSE FALSE
## Sale_Type_ConLI FALSE FALSE
## Sale_Type_ConLw FALSE FALSE
## Sale_Type_CWD FALSE FALSE
## Sale_Type_New FALSE FALSE
## Sale_Type_Oth FALSE FALSE
## Sale_Type_VWD FALSE FALSE
## Sale_Type_WD. FALSE FALSE
## Sale_Condition_AdjLand FALSE FALSE
## Sale_Condition_Alloca FALSE FALSE
## Sale_Condition_Family FALSE FALSE
## Sale_Condition_Normal FALSE FALSE
## Sale_Condition_Partial FALSE FALSE
## Gr_Liv_Area FALSE FALSE
## Bldg_Type_Duplex FALSE FALSE
## Roof_Matl_WdShngl FALSE FALSE
## 1 subsets of each size up to 11
## Selection Algorithm: backward
## Lot_Frontage Lot_Area Lot_Shape Land_Contour Utilities Land_Slope
## 1 ( 1 ) " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " " " "
## Overall_Qual Overall_Cond Year_Built Year_Remod_Add Mas_Vnr_Area
## 1 ( 1 ) "*" " " " " " " " "
## 2 ( 1 ) "*" " " " " " " " "
## 3 ( 1 ) "*" " " " " " " " "
## 4 ( 1 ) "*" " " " " " " " "
## 5 ( 1 ) "*" " " " " " " " "
## 6 ( 1 ) "*" " " " " " " " "
## 7 ( 1 ) "*" " " " " " " " "
## 8 ( 1 ) "*" " " " " " " " "
## 9 ( 1 ) "*" " " " " " " " "
## 10 ( 1 ) "*" " " " " " " " "
## 11 ( 1 ) "*" " " " " " " " "
## Exter_Qual Exter_Cond Bsmt_Qual Bsmt_Cond Bsmt_Exposure
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) "*" " " " " " " " "
## 5 ( 1 ) "*" " " " " " " " "
## 6 ( 1 ) "*" " " " " " " " "
## 7 ( 1 ) "*" " " " " " " " "
## 8 ( 1 ) "*" " " " " " " " "
## 9 ( 1 ) "*" " " " " " " " "
## 10 ( 1 ) "*" " " " " " " " "
## 11 ( 1 ) "*" " " " " " " " "
## BsmtFin_Type_1 BsmtFin_SF_1 BsmtFin_Type_2 BsmtFin_SF_2 Bsmt_Unf_SF
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " "*"
## 6 ( 1 ) " " " " " " " " "*"
## 7 ( 1 ) " " " " " " " " "*"
## 8 ( 1 ) " " " " " " " " "*"
## 9 ( 1 ) " " " " " " " " "*"
## 10 ( 1 ) " " " " " " " " "*"
## 11 ( 1 ) " " " " " " " " "*"
## Total_Bsmt_SF Heating_QC Electrical First_Flr_SF Second_Flr_SF
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " "*" " "
## 3 ( 1 ) " " " " " " "*" "*"
## 4 ( 1 ) " " " " " " "*" "*"
## 5 ( 1 ) " " " " " " "*" "*"
## 6 ( 1 ) "*" " " " " "*" "*"
## 7 ( 1 ) "*" " " " " "*" "*"
## 8 ( 1 ) "*" " " " " "*" "*"
## 9 ( 1 ) "*" " " " " "*" "*"
## 10 ( 1 ) "*" " " " " "*" "*"
## 11 ( 1 ) "*" " " " " "*" "*"
## Low_Qual_Fin_SF Gr_Liv_Area Bsmt_Full_Bath Bsmt_Half_Bath Full_Bath
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Half_Bath Bedroom_AbvGr Kitchen_AbvGr Kitchen_Qual TotRms_AbvGrd
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Functional Fireplaces Fireplace_Qu Garage_Finish Garage_Cars
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Garage_Area Garage_Qual Garage_Cond Paved_Drive Wood_Deck_SF
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Open_Porch_SF Enclosed_Porch Three_season_porch Screen_Porch
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Pool_Area Pool_QC Fence Misc_Val Mo_Sold Year_Sold Longitude Latitude
## 1 ( 1 ) " " " " " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " " " " " " " "
## MS_SubClass_One_Story_1945_and_Older
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_One_and_Half_Story_Finished_All_Ages
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Story_1946_and_Newer
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Story_1945_and_Older MS_SubClass_Split_or_Multilevel
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## MS_SubClass_Duplex_All_Styles_and_Ages
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_One_Story_PUD_1946_and_Newer
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Story_PUD_1946_and_Newer
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_Two_Family_conversion_All_Styles_and_Ages
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_SubClass_other MS_Zoning_Residential_High_Density
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## MS_Zoning_Residential_Low_Density
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## MS_Zoning_Residential_Medium_Density MS_Zoning_A_agr MS_Zoning_C_all
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## MS_Zoning_I_all Street_Pave Alley_No_Alley_Access Alley_Paved
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Lot_Config_CulDSac Lot_Config_FR2 Lot_Config_FR3 Lot_Config_Inside
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Neighborhood_College_Creek Neighborhood_Old_Town Neighborhood_Edwards
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Neighborhood_Somerset Neighborhood_Northridge_Heights
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Neighborhood_Gilbert Neighborhood_Sawyer Neighborhood_Northwest_Ames
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Neighborhood_Sawyer_West Neighborhood_Mitchell Neighborhood_Brookside
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Neighborhood_Crawford Neighborhood_Iowa_DOT_and_Rail_Road
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Neighborhood_Timberland Neighborhood_Northridge Neighborhood_other
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Condition_1_Feedr Condition_1_Norm Condition_1_PosA Condition_1_PosN
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Condition_1_RRAe Condition_1_RRAn Condition_1_RRNe Condition_1_RRNn
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Condition_2_Feedr Condition_2_Norm Condition_2_PosA Condition_2_PosN
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Condition_2_RRAe Condition_2_RRAn Condition_2_RRNn Bldg_Type_TwoFmCon
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Bldg_Type_Duplex Bldg_Type_Twnhs Bldg_Type_TwnhsE
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## House_Style_One_and_Half_Unf House_Style_One_Story House_Style_SFoyer
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## House_Style_SLvl House_Style_Two_and_Half_Fin
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## House_Style_Two_and_Half_Unf House_Style_Two_Story Roof_Style_Gable
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Roof_Style_Gambrel Roof_Style_Hip Roof_Style_Mansard Roof_Style_Shed
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Roof_Matl_CompShg Roof_Matl_Metal Roof_Matl_Roll Roof_Matl_Tar.Grv
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Roof_Matl_WdShake Roof_Matl_WdShngl Exterior_1st_CemntBd
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_1st_HdBoard Exterior_1st_MetalSd Exterior_1st_Plywood
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_1st_VinylSd Exterior_1st_Wd.Sdng Exterior_1st_WdShing
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_1st_other Exterior_2nd_HdBoard Exterior_2nd_MetalSd
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_2nd_Plywood Exterior_2nd_VinylSd Exterior_2nd_Wd.Sdng
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Exterior_2nd_Wd.Shng Exterior_2nd_other Mas_Vnr_Type_BrkFace
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Mas_Vnr_Type_CBlock Mas_Vnr_Type_None Mas_Vnr_Type_Stone
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Foundation_CBlock Foundation_PConc Foundation_Slab Foundation_Stone
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Foundation_Wood Heating_GasA Heating_GasW Heating_Grav Heating_OthW
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## Heating_Wall Central_Air_Y Garage_Type_Basment Garage_Type_BuiltIn
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Garage_Type_CarPort Garage_Type_Detchd
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Garage_Type_More_Than_Two_Types Garage_Type_No_Garage
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " " "
## 10 ( 1 ) " " " "
## 11 ( 1 ) " " " "
## Misc_Feature_Gar2 Misc_Feature_None Misc_Feature_Othr
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " "*" " "
## 9 ( 1 ) " " "*" " "
## 10 ( 1 ) "*" "*" " "
## 11 ( 1 ) "*" "*" "*"
## Misc_Feature_Shed Misc_Feature_TenC Sale_Type_Con Sale_Type_ConLD
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) "*" " " " " " "
## 10 ( 1 ) "*" " " " " " "
## 11 ( 1 ) "*" " " " " " "
## Sale_Type_ConLI Sale_Type_ConLw Sale_Type_CWD Sale_Type_New
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " "*"
## 8 ( 1 ) " " " " " " "*"
## 9 ( 1 ) " " " " " " "*"
## 10 ( 1 ) " " " " " " "*"
## 11 ( 1 ) " " " " " " "*"
## Sale_Type_Oth Sale_Type_VWD Sale_Type_WD. Sale_Condition_AdjLand
## 1 ( 1 ) " " " " " " " "
## 2 ( 1 ) " " " " " " " "
## 3 ( 1 ) " " " " " " " "
## 4 ( 1 ) " " " " " " " "
## 5 ( 1 ) " " " " " " " "
## 6 ( 1 ) " " " " " " " "
## 7 ( 1 ) " " " " " " " "
## 8 ( 1 ) " " " " " " " "
## 9 ( 1 ) " " " " " " " "
## 10 ( 1 ) " " " " " " " "
## 11 ( 1 ) " " " " " " " "
## Sale_Condition_Alloca Sale_Condition_Family Sale_Condition_Normal
## 1 ( 1 ) " " " " " "
## 2 ( 1 ) " " " " " "
## 3 ( 1 ) " " " " " "
## 4 ( 1 ) " " " " " "
## 5 ( 1 ) " " " " " "
## 6 ( 1 ) " " " " " "
## 7 ( 1 ) " " " " " "
## 8 ( 1 ) " " " " " "
## 9 ( 1 ) " " " " " "
## 10 ( 1 ) " " " " " "
## 11 ( 1 ) " " " " " "
## Sale_Condition_Partial
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
Nesse modelo, é utilizada a função tune() para encontrar o melhor lambda (penalty) nas amostras bootstrap.
lasso_spec <- linear_reg(penalty = tune(), mixture = 1) %>%
set_engine("glmnet", standardize = FALSE) #false, pois a base já foi pré-processada anteriormente
lambda_grid <- grid_regular(penalty(), levels = 500)
doParallel::registerDoParallel() #processamento em paralelo para otimizar
set.seed(123)
lasso_grid <- tune_grid(Sale_Price ~.,
model = lasso_spec,
resamples = ames_boot,
grid = lambda_grid)
Avaliação dos parâmetros - lambda para o menor erro quadrado médio (rmse):
lasso_grid %>%
collect_metrics()
## # A tibble: 1,000 x 6
## penalty .metric .estimator mean n std_err
## <dbl> <chr> <chr> <dbl> <int> <dbl>
## 1 1.00e-10 rmse standard 0.370 10 0.0128
## 2 1.00e-10 rsq standard 0.870 10 0.00796
## 3 1.05e-10 rmse standard 0.370 10 0.0128
## 4 1.05e-10 rsq standard 0.870 10 0.00796
## 5 1.10e-10 rmse standard 0.370 10 0.0128
## 6 1.10e-10 rsq standard 0.870 10 0.00796
## 7 1.15e-10 rmse standard 0.370 10 0.0128
## 8 1.15e-10 rsq standard 0.870 10 0.00796
## 9 1.20e-10 rmse standard 0.370 10 0.0128
## 10 1.20e-10 rsq standard 0.870 10 0.00796
## # … with 990 more rows
lasso_grid %>%
collect_metrics() %>%
ggplot(aes(penalty, mean, color = .metric)) +
geom_errorbar(aes(
ymin = mean - std_err,
ymax = mean + std_err
),
alpha = 0.5
) +
geom_line(size = 1.5) +
facet_wrap(~.metric, scales = "free", nrow = 2) +
scale_x_log10() +
labs(x = 'Log(lambda)') +
theme(legend.position = "none")
## Warning: Removed 5 rows containing missing values (geom_errorbar).
## Warning: Removed 4 rows containing missing values (geom_path).
(lasso_lowest_rmse <- lasso_grid %>%
select_best("rmse", maximize = FALSE))
## # A tibble: 1 x 1
## penalty
## <dbl>
## 1 0.00359
Modelo Lasso final: O pacote tidymodels permite finalizar o modelo com o melhor parâmetro encontrado. Após isso, o modelo é aplicado nas amostras criadas com cross-validatione também ajustada na base de treino completa.
lasso_final <- lasso_spec %>%
finalize_model(parameters = lasso_lowest_rmse)
lasso_res <- fit_resamples(Sale_Price ~ .,
lasso_final,
cv_splits,
control = control_resamples(save_pred = TRUE))
(lasso_res %>%
collect_metrics())
## # A tibble: 2 x 5
## .metric .estimator mean n std_err
## <chr> <chr> <dbl> <int> <dbl>
## 1 rmse standard 0.340 5 0.0289
## 2 rsq standard 0.883 5 0.0171
lasso_fit <- lasso_final %>%
fit(Sale_Price ~.,
data = train_baked)
Infelizmente essa forma de ajustar os modelos não permite a visualização das variáveis pela biblioteca plotmo, pois a classe criada pelo tidymodels não é reconhecida. É possível, no entanto, montar o gráfico, mas nesse caso específico, devido à quantidade de variáveis, a visualização fica prejudicada.
Será utilizado, portanto, o pacote vip, que possui funcionalidades semelhantes e permite identificar as variáveis mais importantes no modelo.
#plot(lasso.coef[lasso.coef != 0])
lasso_var <- lasso_fit %>%
vi(lambda = lasso_lowest_rmse$penalty) %>%
mutate(Importance_pct = abs(Importance)/max(abs(Importance))) %>%
mutate(Variable = fct_reorder(Variable, Importance_pct))
#Verificaçãoda seleção das variáveis
lasso_var %>%
count(Importance_pct != 0)
## # A tibble: 2 x 2
## `Importance_pct != 0` n
## <lgl> <int>
## 1 FALSE 4
## 2 TRUE 185
#variáveis que mais impactam no preço
lasso_var %>%
filter(Importance_pct > 0.05) %>%
ggplot(aes(Variable,Importance_pct, fill = Sign)) +
geom_col()+
scale_y_continuous(labels = scales::percent_format())+
coord_flip()
#variáveis que impactam negativamente no preço
#lasso_var %>%
# filter(Importance < 0) %>%
# ggplot(aes(Variable,Importance_pct, fill = Sign)) +
# geom_col()+
# scale_y_continuous(labels = scales::percent_format())+
# coord_flip()
Assim, como no Lasso, o lambda será otimizado.
ridge_spec <- linear_reg(penalty = tune(), mixture = 0) %>% #mixture = 0 para ridge regression
set_engine("glmnet", standardize = FALSE)
ridge_lambda_grid <- grid_regular(penalty(), levels = 1000)
doParallel::registerDoParallel() #processamento em paralelo para otimizar
set.seed(2020)
ridge_grid <- tune_grid(Sale_Price ~.,
model = ridge_spec,
resamples = ames_boot,
grid = ridge_lambda_grid
)
ridge_grid %>%
collect_metrics()
## # A tibble: 2,000 x 6
## penalty .metric .estimator mean n std_err
## <dbl> <chr> <chr> <dbl> <int> <dbl>
## 1 1.00e-10 rmse standard 0.355 10 0.0120
## 2 1.00e-10 rsq standard 0.879 10 0.00728
## 3 1.02e-10 rmse standard 0.355 10 0.0120
## 4 1.02e-10 rsq standard 0.879 10 0.00728
## 5 1.05e-10 rmse standard 0.355 10 0.0120
## 6 1.05e-10 rsq standard 0.879 10 0.00728
## 7 1.07e-10 rmse standard 0.355 10 0.0120
## 8 1.07e-10 rsq standard 0.879 10 0.00728
## 9 1.10e-10 rmse standard 0.355 10 0.0120
## 10 1.10e-10 rsq standard 0.879 10 0.00728
## # … with 1,990 more rows
ridge_grid %>%
collect_metrics() %>%
ggplot(aes(penalty, mean, color = .metric)) +
geom_errorbar(aes(
ymin = mean - std_err,
ymax = mean + std_err
),
alpha = 0.5
) +
geom_line(size = 1.5) +
facet_wrap(~.metric, scales = "free", nrow = 2) +
scale_x_log10() +
theme(legend.position = "none")
(ridge_lowest_rmse <- ridge_grid %>%
select_best("rmse", maximize = FALSE))
## # A tibble: 1 x 1
## penalty
## <dbl>
## 1 0.117
Modelo Final:
ridge_final <- ridge_spec %>%
finalize_model(parameters = ridge_lowest_rmse)
ridge_res <- fit_resamples(Sale_Price ~ .,
ridge_final,
cv_splits,
control = control_resamples(save_pred = TRUE))
(ridge_res %>%
collect_metrics())
## # A tibble: 2 x 5
## .metric .estimator mean n std_err
## <chr> <chr> <dbl> <int> <dbl>
## 1 rmse standard 0.340 5 0.0273
## 2 rsq standard 0.884 5 0.0159
ridge_fit <- ridge_final %>%
fit(Sale_Price ~.,
data = train_baked)
ridge_fit %>%
summary
## Length Class Mode
## lvl 0 -none- NULL
## spec 6 linear_reg list
## fit 12 elnet list
## preproc 5 -none- list
## elapsed 5 proc_time numeric
ridge_fit %>%
tidy %>%
#filter(step <=100) %>%
#filter(estimate < 0.2) %>%
ggplot(aes(lambda, estimate, color = term))+
geom_line(show.legend = FALSE)
#plot(lasso.coef[lasso.coef != 0])
var_ridge <- ridge_fit %>%
vi(lambda = ridge_lowest_rmse$penalty) %>%
mutate(Importance_pct = abs(Importance)/max(abs(Importance))) %>%
mutate(Variable = fct_reorder(Variable, Importance_pct))
#variáveis que mais impactam no preço
var_ridge %>%
filter(Importance_pct > 0.20) %>%
ggplot(aes(Variable,Importance_pct, fill = Sign)) +
geom_col()+
scale_y_continuous(labels = scales::percent_format())+
coord_flip()
#variáveis que impactam negativamente no preço
#var_ridge %>%
# filter(Importance < 0) %>%
# ggplot(aes(Variable,Importance_pct, fill = Sign)) +
# geom_col()+
# scale_y_continuous(labels = scales::percent_format())+
# coord_flip()
Assim como a seleção stepwise, também não há uma função específica no tidymodels para bagging. Seria possível através do crescimento de várias árvores, mas para simplificar, será utilizado o pacote ipred que realiza esse procedimento muito mais rápido.
library(ipred)
set.seed(123)
(bag_fit <- ipred::bagging(Sale_Price ~ ., data = train_baked, coob = TRUE))
##
## Bagging regression trees with 25 bootstrap replications
##
## Call: bagging.data.frame(formula = Sale_Price ~ ., data = train_baked,
## coob = TRUE)
##
## Out-of-bag estimate of root mean squared error: 0.4432
#função para calcular o eqm
f_eqm <- function(model, y = train_baked$Sale_Price){
sum((y - round(model$y))^2) / length(model$y)
}
bag_rmse <- f_eqm(bag_fit)
Aqui será feita a otimização do parâmetro mtry e o número de árvores. Para otimizar o processamento, iniciaremos com um grid apenas para o mtry, para depois testar o número de árvores.
p <- ncol(train_baked) - 1 #total de variáveis preditoras
rf_tune_spec <- rand_forest(mode = "regression",
mtry = tune(), #p/3 = 189
trees = 500) %>%
set_engine("ranger", importance = "permutation")
(rf_grid <- grid_regular(
mtry(range = c(p/5, 100)),
#trees(range = c(500, 2000)),
levels = 5))
## # A tibble: 5 x 1
## mtry
## <int>
## 1 37
## 2 53
## 3 68
## 4 84
## 5 100
doParallel::registerDoParallel() #processamento em paralelo para otimizar
set.seed(2020)
rf_tune <- tune_grid(Sale_Price ~.,
model = rf_tune_spec,
resamples = ames_boot,
grid = rf_grid)
Avaliando os resultados:
rf_tune %>%
collect_metrics()
## # A tibble: 10 x 6
## mtry .metric .estimator mean n std_err
## <int> <chr> <chr> <dbl> <int> <dbl>
## 1 37 rmse standard 0.325 10 0.0113
## 2 37 rsq standard 0.905 10 0.00598
## 3 53 rmse standard 0.325 10 0.0112
## 4 53 rsq standard 0.904 10 0.00617
## 5 68 rmse standard 0.325 10 0.0113
## 6 68 rsq standard 0.903 10 0.00628
## 7 84 rmse standard 0.326 10 0.0115
## 8 84 rsq standard 0.902 10 0.00651
## 9 100 rmse standard 0.328 10 0.0118
## 10 100 rsq standard 0.900 10 0.00675
rf_tune%>%
collect_metrics() %>%
select(mean, mtry, .metric) %>%
filter(.metric == 'rmse') %>%
#pivot_longer(min_n:mtry,
# values_to = "value",
# names_to = "parameter"
#) %>%
ggplot(aes(mtry, mean, color = .metric)) +
geom_point(show.legend = TRUE) #+
#facet_wrap(~parameter, scales = "free_x") +
#labs(x = NULL, y = "Value")
rf_lowest_rmse <- rf_tune %>%
select_best("rmse", maximize = FALSE)
best_mtry = rf_lowest_rmse$mtry
Observa-se que mtry ótimo está perto de 50. Em outras rodadas, ficou evidente que o ótimo está nessa região. Acima disso, o modelo provavelmente está overfitting. Com o melhor mtry, vamos avaliar o erro considerando o número de árvores.
rf_spec <- rand_forest(mode = "regression",
mtry = best_mtry,
trees = tune()) %>%
set_engine("ranger", importance = "permutation")
rf_grid <- grid_regular(
#mtry(range = c(p/4,100)),
trees(range = c(500, 2000)),
levels = 4)
doParallel::registerDoParallel()
set.seed(2020)
rf_tune <- tune_grid(Sale_Price ~.,
model = rf_spec,
resamples = ames_boot,
grid = rf_grid)
rf_tune %>%
collect_metrics()
## # A tibble: 8 x 6
## trees .metric .estimator mean n std_err
## <int> <chr> <chr> <dbl> <int> <dbl>
## 1 500 rmse standard 0.324 10 0.0115
## 2 500 rsq standard 0.904 10 0.00632
## 3 1000 rmse standard 0.325 10 0.0113
## 4 1000 rsq standard 0.903 10 0.00620
## 5 1500 rmse standard 0.324 10 0.0114
## 6 1500 rsq standard 0.904 10 0.00624
## 7 2000 rmse standard 0.324 10 0.0115
## 8 2000 rsq standard 0.904 10 0.00634
rf_tune %>%
collect_metrics() %>%
select(mean, trees, .metric) %>%
filter(.metric == 'rmse') %>%
#pivot_longer(min_n:mtry,
# values_to = "value",
# names_to = "parameter") %>%
ggplot(aes(trees, mean, color = .metric)) +
geom_point(show.legend = TRUE)
rf_lowest_rmse <- rf_tune %>%
select_best("rmse", maximize = FALSE)
rf_final <- rf_spec %>%
finalize_model(parameters = rf_lowest_rmse)
rf_res <- fit_resamples(Sale_Price ~ .,
rf_final,
cv_splits,
control = control_resamples(save_pred = TRUE))
rf_fit <- rf_final %>%
fit(Sale_Price ~.,
data = train_baked)
rf_fit %>%
summary
## Length Class Mode
## lvl 0 -none- NULL
## spec 6 rand_forest list
## fit 15 ranger list
## preproc 1 -none- list
## elapsed 5 proc_time numeric
Importância das variáveis
vi(rf_fit) %>%
mutate(Importance_pct = abs(Importance)/max(abs(Importance))) %>%
mutate(Variable = fct_reorder(Variable, Importance_pct)) %>%
filter(Importance_pct > 0.05) %>%
ggplot(aes(Variable, Importance_pct)) +
geom_point()+
scale_y_continuous(labels = scales::percent_format())+
coord_flip()
No gráfico abaixo, é possível observar o erro médio quadrado e o R2 para cada dos modelos, exceto para bagging, que foi ajustado diretamente na base completa.
lm_res %>%
select(id, .metrics) %>%
unnest(.metrics) %>%
mutate(model = "linear regression") %>%
bind_rows(lasso_res %>%
select(id, .metrics) %>%
unnest(.metrics) %>%
mutate(model = "lasso")) %>%
bind_rows(ridge_res %>%
select(id, .metrics) %>%
unnest(.metrics) %>%
mutate(model = "ridge")) %>%
#bind_rows(bag_res %>%
# select(id, .metrics) %>%
# unnest(.metrics) %>%
# mutate(model = "bagging")) %>%
bind_rows(rf_res %>%
select(id, .metrics) %>%
unnest(.metrics) %>%
mutate(model = "random forest")) %>%
ggplot(aes(id, .estimate, group = model, color = model)) +
geom_point(size = 1.5) +
facet_wrap(~.metric) +
coord_flip()
Para ter uma ideia melhor, a seguir a comparação das métricas nas bases de treino e teste completas.
results_train <- lm_fit %>%
predict(new_data = train_baked) %>%
mutate(truth = train_baked$Sale_Price,
model = 'lm') %>%
bind_rows(lasso_fit %>%
predict(new_data = train_baked) %>%
mutate(truth = train_baked$Sale_Price,
model = 'lasso')) %>%
bind_rows(ridge_fit %>%
predict(new_data = train_baked) %>%
mutate(truth = train_baked$Sale_Price,
model = 'ridge')) %>%
bind_rows(tibble(.pred = predict(bag_fit, newdata = train_baked),
truth = train_baked$Sale_Price,
model = 'bagging')) %>%
bind_rows(rf_fit %>%
predict(new_data = train_baked) %>%
mutate(truth = train_baked$Sale_Price,
model = 'random forest'))
## Warning in predict.lm(object = object$fit, newdata = new_data, type =
## "response"): prediction from a rank-deficient fit may be misleading
results_train %>%
group_by(model) %>%
rmse(truth = truth, estimate = .pred)
## # A tibble: 5 x 4
## model .metric .estimator .estimate
## <chr> <chr> <chr> <dbl>
## 1 bagging rmse standard 0.400
## 2 lasso rmse standard 0.290
## 3 lm rmse standard 0.270
## 4 random forest rmse standard 0.128
## 5 ridge rmse standard 0.295
Pela base de treino, observa-se que o melhor modelo foi random forest, seguido por bagging. Para confirmar, vamos observar o comportamento na base teste:
results_test <- lm_fit %>%
predict(new_data = test_baked) %>%
mutate(truth = test_baked$Sale_Price,
model = 'lm') %>%
bind_rows(lasso_fit %>%
predict(new_data = test_baked) %>%
mutate(truth = test_baked$Sale_Price,
model = 'lasso')) %>%
bind_rows(ridge_fit %>%
predict(new_data = test_baked) %>%
mutate(truth = test_baked$Sale_Price,
model = 'ridge')) %>%
bind_rows(tibble(.pred = predict(bag_fit, newdata = test_baked),
truth = test_baked$Sale_Price,
model = 'bagging')) %>%
bind_rows(rf_fit %>%
predict(new_data = test_baked) %>%
mutate(truth = test_baked$Sale_Price,
model = 'random forest'))
## Warning in predict.lm(object = object$fit, newdata = new_data, type =
## "response"): prediction from a rank-deficient fit may be misleading
results_test %>%
group_by(model) %>%
rmse(truth = truth, estimate = .pred)
## # A tibble: 5 x 4
## model .metric .estimator .estimate
## <chr> <chr> <chr> <dbl>
## 1 bagging rmse standard 0.477
## 2 lasso rmse standard 0.503
## 3 lm rmse standard 0.557
## 4 random forest rmse standard 0.373
## 5 ridge rmse standard 0.479
Na base de teste, tivemos o mesmo resultado: o modelo que melhor performou foi random forest. No entanto, o segundo melhor modelo foi a regressão linear, que teve a pior performance na base de treino. Ridge e Lasso performaram de forma semelhante nas duas bases.
Conforme avaliado na análise exploratória, as principais variáveis preditoras foram as mais correlacionadas. Isso explica também o alto desempenho do modelo linear. O pacote tidymodels otimiza o processo de execução e comparação dos modelos, mesmo não possuindo todos os utilizados nesse relatório. Observando individualmente, o ajuste com Lasso foi o que fez menos sentido em relação às variáveis selecionadas. A floresta aleatória foi o modelo que melhor performou, tanto na base treino, como na base teste. Além disso, as variáveis preditoras mais importantes foram as que apresentaram maior correlação linear, conforme visto na análise exploratória.
Como referências para o desenvolvimento do relatório foram utilizadas as seguintes fontes: